dtangle2 {dtangle}  R Documentation 
Deconvolve cell type mixing proportions from gene expression data.
dtangle2(Y, references = NULL, pure_samples = NULL, n_markers = NULL,
markers = NULL, marker_method = "ratio", weights = NULL,
sto = TRUE, inv_scale = function(x) 2^x, fit_scale = log,
loss_smry = "var", dtangle_init = TRUE, seed = NULL,
verbose = FALSE, optim_opts = NULL)
Y 
Expression matrix. (Required) Twodimensional numeric. Must implement Each row contains expression measurements for a particular sample. Each columm contains the measurements of the same gene over all individuals. Can either contain just the mixture samples to be deconvolved or both the mixture samples and the reference samples. See 
references 
Celltype reference expression matrix. (Optional) Twodimensional numeric. Must implement Each row contains expression measurements for a reference profile of a particular cell type. Columns contain measurements of reference profiles of a gene. Optionally may merge this matrix with 
pure_samples 
The pure sample indicies. (Optional) List of onedimensional integer. Must implement The ith element of the toplevel list is a vector of indicies (rows of 
n_markers 
Number of marker genes. (Optional) Onedimensional numeric. How many markers genes to use for deconvolution. Can either be a single integer, vector of integers (one for each cell type), or single or vector of percentages (numeric in 0 to 1). If a single integer then all cell types use that number of markers. If a vector then the ith element determines how many marker genes are used for the ith cell type. If single percentage (in 0 to 1) then that percentage of markers are used for all types. If vector of percentages then that percentage used for each type, respectively. If not specified then top 10% of genes are used. 
markers 
Marker gene indices. (Optional) List of onedimensional integer. Toplevel list should be same length as 
marker_method 
Method used to rank marker genes. (Optional) Onedimensional string. The method used to rank genes as markers. If not supplied defaults to “ratio”. Only used if markers are not provided to argument “markers”. Options are

weights 
Weights for the genes. (Optional) String or onedimensional numeric vector. Weights for the genes in the optimization. If NULL (default) then does not weight genes differently. If 'variance' then inversely weights with the variance of the references. This only works if there is more than one reference per cell type so that the variance can be estimated. If a numeric then this uses whatever is specified as weights. They must be nonnegative. 
sto 
Sumtoone constraint. (Optional) Boolean. Renormalize the estimates so that the celltype proportions sum to one. 
inv_scale 
Inverse scale transformation. (Optional) Function. Defaults to 2^x. This is equivalent to assuming that the data has been log2transformed. If another transformation has been applied to the data then this function should be used to specify the inverse of that transformation needed to put gene expressions on the linear scale. 
fit_scale 
Transformation to used as part of optimization. (Optional) Function. Function to apply to gene expressions as part of optimization. Defaults to log. 
loss_smry 
Loss summary function minimized to find estimated proportions. (Optional) String. Either 'var' (default) to minimze the (weighted) variance of the residuals or 'L2' to minimize the (weighted) sums of squares of the residuals. 
dtangle_init 
Optimization initialization. (Optional) Boolean. Boolean controlling if dtangle2 optimization should be initialized using dtangle1 estimates. 
seed 
(Optional) Integer. Value at which to seed the random seed before estimating. Optimization initialization might change if this value is not specified. 
verbose 
(Optional) Boolean. Controls if optimization output is printed or not. 
optim_opts 
(Optional) List. Optimization options passed to DEoptimR controlling optimization. Options that may be set are

List.
'estimates' a matrix estimated mixing proportions. One row for each sample, one column for each cell type.
'markers' list of vectors of marker used for each cell type. Each element of list is vector of columns of Y
used as a marker for the ith cell type.
'n_markers' vector of number of markers used for each cell type.
'weights' the weights used as part of the optimization.
'diag' diagnostic values for the estimated proportions.
resids_hat
,loss_hat
, and p_hat
are the residuals, loss, and estimates for the proportions returned by dtangle2. Similarly, resids_opt
,loss_opt
and p_opt
are these values for the optimized value not rescaled to enforce the STO constraint.
truth = shen_orr_ex$annotation$mixture
pure_samples < lapply(1:3, function(i) {
which(truth[, i] == 1)
})
Y < shen_orr_ex$data$log
n_markers = 20
dtangle2(Y, pure_samples = pure_samples,
n_markers=n_markers)