find_markers {dtangle}R Documentation

Find marker genes for each cell type.

Description

Find marker genes for each cell type.

Usage

find_markers(Y, references = NULL, pure_samples = NULL,
  data_type = NULL, gamma = NULL, marker_method = "ratio")

Arguments

Y

Expression matrix.

(Required) Two-dimensional numeric. Must implement as.matrix.

Each row contains expression measurements for a particular sample. Each columm contains the measurements of the same gene over all individuals. Can either contain just the mixture samples to be deconvolved or both the mixture samples and the reference samples. See pure_samples and references for more details.

references

Cell-type reference expression matrix.

(Optional) Two-dimensional numeric. Must implement as.matrix. Must have same number of columns as Y. Columns must correspond to columns of Y.

Each row contains expression measurements for a reference profile of a particular cell type. Columns contain measurements of reference profiles of a gene. Optionally may merge this matrix with Y and use pure_samples to indicate which rows of Y are pure samples. If pure_samples is not specified references must be specified. In this case each row of references is assumed to be a distinct cell-type. If both pure_samples and references are specified then multiple rows of references may refer be the same cell type, and pure_samples specifies to which cell-type each row of references corresponds.

pure_samples

The pure sample indicies.

(Optional) List of one-dimensional integer. Must implement as.list.

The i-th element of the top-level list is a vector of indicies (rows of Y or references) that are pure samples of type i. If references is not specified then this argument identifies which rows of Y correspond to pure reference samples of which cell-types. If references is specified then this makes same idenficiation but for the references matrix instead.

data_type

Type of expression measurements.

(Optional) One-dimensional string.

An optional string indicating the type of the expression measurements. This is used to set gamma to a pre-determined value based upon the data type. Valid values are for probe-level microarray as “microarray-probe”, gene-level microarray as “microarray-gene” or rna-seq as “rna-seq”. Alternatively can set gamma directly.

gamma

Expression adjustment term.

(Optional) One-dimensional positive numeric.

If provided as a single positive number then that value will be used for gamma and over-ride the value of gamma chosen by the data_type argument. If neither gamma nor data_type are specified then gamma will be set to one.

marker_method

Method used to rank marker genes.

(Optional) One-dimensional string.

The method used to rank genes as markers. If not supplied defaults to “ratio”. Only used if markers are not provided to argument “markers”. Options are

  • 'ratio' selects and ranks markers by the ratio of the mean expression of each gene in each cell type to the mean of that gene in all other cell types.

  • 'regression ' selects and ranks markers by estimated regression coefficients in a series of regressions with single covariate that is indicator of each type.

  • 'diff' selects and ranks markers based upon the difference, for each cell type, between the median expression of a gene by each cell type and the median expression of that gene by the second most highly expressed cell type.

  • 'p.value' selects and ranks markers based upon the p-value of a t-test between the median expression of a gene by each cell type and the median expression of that gene by the second most highly expressed cell type.

Value

List with four elements. “L” is respective ranked markers for each cell type and “V” is the corresponding values of the ranking method (higher are better) used to determine markers and sort them, “M” is the matrix used to create the other two arguments after sorting and subsetting, and “sM” is a sorted version of M.

Examples

truth = shen_orr_ex$annotation$mixture
pure_samples <- lapply(1:3, function(i) {
   which(truth[, i] == 1)
})
Y <- shen_orr_ex$data$log
find_markers(Y=Y,pure_samples=pure_samples,
data_type='microarray-gene',marker_method='ratio')

[Package dtangle version 2.0.9 Index]