iterating_seqVsInsitu {cellOrigins}R Documentation

Faster comparisons between mixed tissue-specific RNA sequencing data and high-throughput RNA in situ hybridisation

Description

The same functionality as seqVsInsitu but computationally less expensive if combinations of anatomical terms are tested.

The number of term combinations to test increases rapidly in seqVsInsitu. For example with 350 anatomical terms there are 61425 combinations of 2 terms and 7207200 combinations of 3 terms. This makes the exhaustive search of seqVsInsitu costly with depth>2.

iterating_seqVsInsitu reduces the computational cost by initially testing the combinations of only a few terms. Then in each iteration the cardinality of the combinations is increased by one, but only the top anatomical terms of the previous iteration are used to reduce the number of tested combinations.

Usage

iterating_seqVsInsitu(seq_signature, upto_depth, use_topN = 50,
  start_depth = 2, insitu = cellOrigins::BDGP_insitu_dmel_embryo,
  insitu_discovery_function = discovery.log, saturate = 500,
  prior = prior.temporal_proximity_is_good)

Arguments

seq_signature

A named vector containing FPKM RNAseq data. Each element name must correspond to the names used in the insitu argument. NAs are permitted.

upto_depth

Number of terms to combine in the final iteration.

use_topN

How many of the top results from the previous iteration to use to find the terms for the current iteration.

start_depth

Number of terms to combine in the first iteration. All combinations of all terms are tested at this step.

insitu

Matrix with RNA in situ hybridisation data. Rows are transcript names (queried by probes: same names as used for seq_signature) and coloumns are anatomical terms (possibly combined with developmental stages). If a probe stains in a particular tissue, the value is 1, otherwise 0. Defaults to BDGP_insitu_dmel_embryo, a staining dataset for fruit fly embryos.

insitu_discovery_function

A function that converts FPKM values to the probability of discovery by RNA in situ hybridisation. Values must be ]0..1[, 0 and 1 are not permitted. Defaults to discovery.log, an approximation of empirically determined discovery probabilities. Other available functions are discovery.linear and discovery.identic.

saturate

Will be passed on to the insitu_discovery_function. The data set dependent maximum value at which the discovery probability should saturate. Defaults to 500 (FPKM).

prior

A function that evaluates to the log2 prior probability of each anatomic term or combination of terms. Defaults to prior.temporal_proximity_is_good, which works well with BDGP_insitu_dmel_embryo. prior.all_equal assumes equal probability of all terms.

Value

Returns a named list that contains a matrix for each iteration like those produced by seqVsInsitu.

See Also

seqVsInsitu

Examples

## Not run: 
fpath <- system.file("extdata", "vncMedianCoverage.tsv", package="cellOrigins")
vncExpression <- read.delim(file = fpath, header=FALSE, as.is=TRUE)

expression <- vncExpression$V2
names(expression) <- vncExpression$V1

oracleResponse <- iterating_seqVsInsitu(expression, 3)
head(oracleResponse[[1]])
head(oracleResponse[[2]])
diagnosticPlots(oracleResponse)

## End(Not run)

[Package cellOrigins version 0.1.3 Index]