iterating_seqVsInsitu {cellOrigins}R Documentation

Faster comparisons between mixed tissue-specific RNA sequencing data and high-throughput RNA in situ hybridisation


The same functionality as seqVsInsitu but computationally less expensive if combinations of anatomical terms are tested.

The number of term combinations to test increases rapidly in seqVsInsitu. For example with 350 anatomical terms there are 61425 combinations of 2 terms and 7207200 combinations of 3 terms. This makes the exhaustive search of seqVsInsitu costly with depth>2.

iterating_seqVsInsitu reduces the computational cost by initially testing the combinations of only a few terms. Then in each iteration the cardinality of the combinations is increased by one, but only the top anatomical terms of the previous iteration are used to reduce the number of tested combinations.


iterating_seqVsInsitu(seq_signature, upto_depth, use_topN = 50,
  start_depth = 2, insitu = cellOrigins::BDGP_insitu_dmel_embryo,
  insitu_discovery_function = discovery.log, saturate = 500,
  prior = prior.temporal_proximity_is_good)



A named vector containing FPKM RNAseq data. Each element name must correspond to the names used in the insitu argument. NAs are permitted.


Number of terms to combine in the final iteration.


How many of the top results from the previous iteration to use to find the terms for the current iteration.


Number of terms to combine in the first iteration. All combinations of all terms are tested at this step.


Matrix with RNA in situ hybridisation data. Rows are transcript names (queried by probes: same names as used for seq_signature) and coloumns are anatomical terms (possibly combined with developmental stages). If a probe stains in a particular tissue, the value is 1, otherwise 0. Defaults to BDGP_insitu_dmel_embryo, a staining dataset for fruit fly embryos.


A function that converts FPKM values to the probability of discovery by RNA in situ hybridisation. Values must be ]0..1[, 0 and 1 are not permitted. Defaults to discovery.log, an approximation of empirically determined discovery probabilities. Other available functions are discovery.linear and discovery.identic.


Will be passed on to the insitu_discovery_function. The data set dependent maximum value at which the discovery probability should saturate. Defaults to 500 (FPKM).


A function that evaluates to the log2 prior probability of each anatomic term or combination of terms. Defaults to prior.temporal_proximity_is_good, which works well with BDGP_insitu_dmel_embryo. prior.all_equal assumes equal probability of all terms.


Returns a named list that contains a matrix for each iteration like those produced by seqVsInsitu.

See Also



## Not run: 
fpath <- system.file("extdata", "vncMedianCoverage.tsv", package="cellOrigins")
vncExpression <- read.delim(file = fpath, header=FALSE,

expression <- vncExpression$V2
names(expression) <- vncExpression$V1

oracleResponse <- iterating_seqVsInsitu(expression, 3)

## End(Not run)

[Package cellOrigins version 0.1.3 Index]