tensor {Unico}R Documentation

Inferring the underlying source-specific 3D tensor

Description

Infers the underlying (sources by features by observations) 3D tensor from the observed (features by observations) 2D mixture, under the assumption of the Unico model that each observation is a mixture of unique source-specific values (in each feature in the data). In the context of bulk genomics containing a mixture of cell types (i.e. the input could be CpG sites by individuals for DNA methylation and genes by individuals for RNA expression), tensor allows to estimate the cell-type-specific levels for each individual in each CpG site/gene (i.e. a tensor of CpG sites/genes by individuals by cell types).

Usage

tensor(
  X,
  W,
  C1,
  C2,
  Unico.mdl,
  parallel = TRUE,
  num_cores = NULL,
  log_file = "Unico.log",
  verbose = FALSE,
  debug = FALSE
)

Arguments

X

An m by n matrix of measurements of m features for n observations. Each column in X is assumed to be a mixture of k sources. Note that X must include row names and column names and that NA values are currently not supported. X should not include features that are constant across all observations. Note that X could potentially be different from the X used to learn Unico.mdl (i.e. the original observed 2D mixture used to fit the model).

W

An n by k matrix of weights - the weights of k sources for each of the n mixtures (observations). All the weights must be positive and each row - corresponding to the weights of a single observation - must sum up to 1. Note that W must include row names and column names and that NA values are currently not supported.

C1

An n by p1 design matrix of covariates that may affect the hidden source-specific values (possibly a different effect size in each source). Note that C1 must include row names and column names and should not include an intercept term. NA values are currently not supported. Note that all covariates in C1 must be present and match the order of the set of covariates in C1 stored in Unico.mdl (i.e. the original set of source-specific covariates available when initially fitting the model).

C2

An n by p2 design matrix of covariates that may affect the mixture (i.e. rather than directly the sources of the mixture; for example, variables that capture biases in the collection of the measurements). Note that C2 must include row names and column names and should not include an intercept term. NA values are currently not supported. Note that all covariates in C2 must be present and match the order of the set of covariates in C2 stored in Unico.mdl (i.e. the original set of not source-specific covariates available when initially fitting the model).

Unico.mdl

The entire set of model parameters estimated by Unico on the 2D mixture matrix (i.e. the list returned by applying function Unico to X).

parallel

A logical value indicating whether to use parallel computing (possible when using a multi-core machine).

num_cores

A numeric value indicating the number of cores to use (activated only if parallel == TRUE). If num_cores == NULL then all available cores except for one will be used.

log_file

A path to an output log file. Note that if the file log_file already exists then logs will be appended to the end of the file. Set log_file to NULL to prevent output from being saved into a file; note that if verbose == FALSE then no output file will be generated regardless of the value of log_file.

verbose

A logical value indicating whether to print logs.

debug

A logical value indicating whether to set the logger to a more detailed debug level; set debug to TRUE before reporting issues.

Details

After obtaining all the estimated parameters in the Unico model (by calling Unico), tensor uses the conditional distribution Z_{jh}^i|X_{ij}=x_{ij} for estimating the k source-specific levels of each sample i at each feature j.

Value

A k by m by n array with the estimated source-specific values. The first axis/dimension in the array corresponds to the different sources.

Examples

data = simulate_data(n=100, m=2, k=3, p1=1, p2=1, taus_std=0, log_file=NULL)
res = list()
res$params.hat = Unico(data$X, data$W, data$C1, data$C2, parallel=FALSE, log_file=NULL)
res$Z = tensor(data$X, data$W, data$C1, data$C2, res$params.hat, parallel=FALSE, log_file=NULL)


[Package Unico version 0.1.0 Index]