seqVsInsitu {cellOrigins} | R Documentation |
Determine the most likely source(s) of a tissue-specific RNAseq dataset
Description
Compares tissue-specific RNA sequencing coverage with high-throughput RNA in situ hybridisation patterns of gene expression. All pattern combinations are tested in an exhaustive search.
Usage
seqVsInsitu(seq_signature, depth = 2, insitu = cellOrigins::BDGP_insitu_dmel_embryo,
insitu_discovery_function = discovery.log, saturate = 500,
prior = prior.temporal_proximity_is_good)
Arguments
seq_signature |
A named vector containing FPKM RNAseq data. Each element name must correspond to the names used in the |
depth |
Number of RNA in situ expression patterns to combine to identify mixed populations. If 1, the expression patterns as given are used. Otherwise all combinations of |
insitu |
Matrix with RNA in situ hybridisation results. Rows are transcript names (same names as used for |
insitu_discovery_function |
A function that converts FPKM values to the probability of discovery by RNA in situ hybridisation. Probabilities must be ]0..1[, the values 0 and 1 are not permitted.
Defaults to |
saturate |
Will be passed on to the |
prior |
A function that returns the log2 prior probability of each anatomic term or combination of terms.
Defaults to |
Details
First, the function calculates for each sequenced transcript how likely it is that it would produce an RNA in situ signal, given its expresion strength. Using these staining probabilities and Bayes's rule the function then calculates the probability score for each of the given RNA in situ hybridisation patterns that it was produced by the same gene expression pattern as the sequenced transcriptome.
If depth
>1 then the function identifies the origins of not pure sequenced material. For that it merges multiple RNA in situ hybridisation patterns for comparison with the sequenced data. This simulates the outcome of cell populations mixing.
seq_signature
is best generated by taking the mean coverage of the regions which are actually tested with the RNA in situ hybridisation probes. This circumvents problems from misannotation, overlapping transcripts and faulty quantitation of individual transcripts from sequencing data. A protocol for generating such datasets is given in the package reference.
Value
A matrix with a row for each anatomical term (or combination of terms) and at least four columns. The terms are sorted by the posterior value and the top term is the most likely source of the RNAseq transcriptome.
posterior |
A log2 posterior probability score. The highest value is given to the most likely tissue of origin. The value is only meaningful in comparison with other values within the same result set. |
prior |
Prior probability of the anatomical term(s), as given by the function |
likelihood.from.absence.insitu |
Probability score from all the genes where RNA in situ hybridisation did not report staining. |
likelihood.from.presence.insitu |
Probability score from all the genes where in situ hybridisation reported staining. |
remaining coloumns |
Number of additional expressed genes added to the in situ signature with each term in the tested combination. Sometimes additional terms add only very few or no new genes at all. Such tissue contributions are meaningless artefacts. |
The posterior column is the sum of the other three named columns. The scores are proportional to the (unknown) probabilities of identity.
See Also
iterating_seqVsInsitu
, BDGP_insitu_dmel_embryo
, discovery.log
, discovery.linear
, discovery.identic
, prior.temporal_proximity_is_good
, prior.all_equal
, diagnosticPlots
.
Examples
fpath <- system.file("extdata", "vncMedianCoverage.tsv", package="cellOrigins")
vncExpression <- read.delim(file = fpath, header=FALSE, as.is=TRUE)
expression <- vncExpression$V2
names(expression) <- vncExpression$V1
result <- seqVsInsitu(expression, depth=1)