Unico {Unico} | R Documentation |
Fitting the Unico model
Description
Fits the Unico model for an input matrix of features by observations that are coming from a mixture of k
sources, under the assumption that each observation is a mixture of unique (unobserved) source-specific values (in each feature in the data). Specifically, for each feature, it standardizes the data and learns the source-specific mean and full k
by k
variance-covariance matrix.
Usage
Unico(
X,
W,
C1,
C2,
fit_tau = FALSE,
mean_penalty = 0,
var_penalty = 0.01,
covar_penalty = 0.01,
mean_max_iterations = 2,
var_max_iterations = 3,
nloptr_opts_algorithm = "NLOPT_LN_COBYLA",
max_stds = 2,
init_weight = "default",
max_u = 1,
max_v = 1,
parallel = TRUE,
num_cores = NULL,
log_file = "Unico.log",
verbose = FALSE,
debug = FALSE
)
Arguments
X |
An |
W |
An |
C1 |
An |
C2 |
An |
fit_tau |
A logical value indicating whether to fit the standard deviation of the measurement noise (i.e. the i.i.d. component of variation in the model denoted as |
mean_penalty |
A non-negative numeric value indicating the regularization strength on the source-specific mean estimates. |
var_penalty |
A non-negative numeric value indicating the regularization strength on the diagonal entries of the full |
covar_penalty |
A non-negative numeric value indicating the regularization strength on the off diagonal entries of the full |
mean_max_iterations |
A non-negative numeric value indicating the number of iterative updates performed on the mean estimates. |
var_max_iterations |
A non-negative numeric value indicating the number of iterative updates performed on the variance-covariance matrix. |
nloptr_opts_algorithm |
A string indicating the optimization algorithm to use. |
max_stds |
A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers. Only samples within |
init_weight |
A string indicating the initial weights on the samples to start the iterative optimization. |
max_u |
A non-negative numeric value indicating the maximum weights/influence a sample can have on mean estimates. |
max_v |
A non-negative numeric value indicating the maximum weights/influence a sample can have on variance-covariance estimates. |
parallel |
A logical value indicating whether to use parallel computing (possible when using a multi-core machine). |
num_cores |
A numeric value indicating the number of cores to use (activated only if |
log_file |
A path to an output log file. Note that if the file |
verbose |
A logical value indicating whether to print logs. |
debug |
A logical value indicating whether to set the logger to a more detailed debug level; set |
Details
Unico assumes the following model:
X_{ij} = w_{i}^T Z_{ij} +(c_i^{(2)})^T \beta_j+ e_{ij}
The mixture value at sample i
feature j
: X_{ij}
is modeled as a weighted linear combination, specified by weights w_i = (w_{i1},...,w_{ik})
, of a total of k
source-specific levels, specified by Z_{ij}=(Z_{ij1},...,Z_{ijk})
.
In addition, we also consider global-level covariates c_i^{(2)}
that systematically affect the observed mixture values and their effect sizes \beta_j
. e_{ij}
denotes the i.i.d measurement noise with variance \tau
across all samples.
Weights have be to non-negative and sum up to 1
across all sources for each sample. In practice, we assume that the weights are fixed and estimated by external methods.
Source specific profiles are further modeled as:
Z_{ijh} = \mu_{jh} + (c_i^{(1)})^T \gamma_{jh} + \epsilon_{ijh}
\mu_{jh}
denotes the population level mean of feature j
at source h
.
We also consider covariates c_i^{(1)}
that systematically affect the source-specific values and their effect sizes \gamma_{jh}
on each source.
Finally, we actively model the k
by k
covariance structure of a given feature j
across all k
sources Var[\vec{\epsilon_{ij}}] = \Sigma_{j} \in \mathbf{R}^{k \times k}
.
Value
A list with the estimated parameters of the model. This list can be then used as the input to other functions such as tensor.
W |
An |
C1 |
An |
C2 |
An |
mus_hat |
An |
gammas_hat |
An |
betas_hat |
An |
sigmas_hat |
An |
taus_hat |
An |
scale.factor |
An |
config |
A list with hyper-parameters used for fitting the model and configurations for in the optimization algorithm. |
Us_hat_list |
A list tracking, for each feature, the sample weights used for each iteration of the mean optimization (activated only if |
Vs_hat_list |
A list tracking, for each feature, the sample weights used for each iteration of the variance-covariance optimization (activated only if |
Ls_hat_list |
A list tracking, for each feature, the computed estimates of the upper triangular cholesky decomposition of variance-covariance matrix at each iteration of the variance-covariance optimization (activated only if |
sigmas_hat_list |
A list tracking, for each feature, the computed estimates of the variance-covariance matrix at each iteration of the variance-covariance optimization (activated only if |
Examples
data = simulate_data(n=100, m=2, k=3, p1=1, p2=1, taus_std=0, log_file=NULL)
res = list()
res$params.hat = Unico(data$X, data$W, data$C1, data$C2, parallel=FALSE, log_file=NULL)