R: Faster comparisons between mixed tissue-specific RNA...

iterating_seqVsInsitu {cellOrigins}

R Documentation

Faster comparisons between mixed tissue-specific RNA sequencing data and high-throughput RNA in situ hybridisation

Description

The same functionality as seqVsInsitu but computationally less expensive if combinations of anatomical terms are tested.

The number of term combinations to test increases rapidly in seqVsInsitu. For example with 350 anatomical terms there are 61425 combinations of 2 terms and 7207200 combinations of 3 terms. This makes the exhaustive search of seqVsInsitu costly with depth>2.

iterating_seqVsInsitu reduces the computational cost by initially testing the combinations of only a few terms. Then in each iteration the cardinality of the combinations is increased by one, but only the top anatomical terms of the previous iteration are used to reduce the number of tested combinations.

Usage

iterating_seqVsInsitu(seq_signature, upto_depth, use_topN = 50,
  start_depth = 2, insitu = cellOrigins::BDGP_insitu_dmel_embryo,
  insitu_discovery_function = discovery.log, saturate = 500,
  prior = prior.temporal_proximity_is_good)

Arguments

`seq_signature`	A named vector containing FPKM RNAseq data. Each element name must correspond to the names used in the `insitu` argument. NAs are permitted.
`upto_depth`	Number of terms to combine in the final iteration.
`use_topN`	How many of the top results from the previous iteration to use to find the terms for the current iteration.
`start_depth`	Number of terms to combine in the first iteration. All combinations of all terms are tested at this step.
`insitu`	Matrix with RNA in situ hybridisation data. Rows are transcript names (queried by probes: same names as used for `seq_signature`) and coloumns are anatomical terms (possibly combined with developmental stages). If a probe stains in a particular tissue, the value is 1, otherwise 0. Defaults to `BDGP_insitu_dmel_embryo`, a staining dataset for fruit fly embryos.
`insitu_discovery_function`	A function that converts FPKM values to the probability of discovery by RNA in situ hybridisation. Values must be ]0..1[, 0 and 1 are not permitted. Defaults to `discovery.log`, an approximation of empirically determined discovery probabilities. Other available functions are `discovery.linear` and `discovery.identic`.
`saturate`	Will be passed on to the `insitu_discovery_function`. The data set dependent maximum value at which the discovery probability should saturate. Defaults to 500 (FPKM).
`prior`	A function that evaluates to the log2 prior probability of each anatomic term or combination of terms. Defaults to `prior.temporal_proximity_is_good`, which works well with `BDGP_insitu_dmel_embryo`. `prior.all_equal` assumes equal probability of all terms.

Value

Returns a named list that contains a matrix for each iteration like those produced by seqVsInsitu.

Examples

## Not run: 
fpath <- system.file("extdata", "vncMedianCoverage.tsv", package="cellOrigins")
vncExpression <- read.delim(file = fpath, header=FALSE, as.is=TRUE)

expression <- vncExpression$V2
names(expression) <- vncExpression$V1

oracleResponse <- iterating_seqVsInsitu(expression, 3)
head(oracleResponse[[1]])
head(oracleResponse[[2]])
diagnosticPlots(oracleResponse)

## End(Not run)