BDGP_insitu_dmel_embryo {cellOrigins}R Documentation

Patterns of gene expression in Drosophila melanogaster embryos


High-confidence dataset of embryonic Drosophila melanogaster RNA expression patterns at 6 developmental stages. This dataset was generated by filtering the "BDGP insitu" high-throughput RNA in situ hybridisation data set (Tomancak, Genome Biol. 2007;8(7):R145) for high-confidence results. Only genes useful for tissue identification were retained, and they thus represent gene expression fingerprints of organs.




The format is: num [1:2395, 1:337] 1 0 0 0 1 1 0 1 1 1 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:2395] "LD11379" "LD11394" "LD12611" "LD12613" ... ..$ : chr [1:337] "1|maternal" "2|pole cell" "3|pole cell" "4|germ cell" ...


The expression data are collated in a matrix. The columns in the matrix are labeled stage|domain (e.g. "6|midgut"). The expression domains are denoted using the BDGP insitu controlled anatomical vocabulary.The rows are labeled with transcripts/probe names according to the BDGP insitu data set. The hybridisation probe genomic coordinates (Drosophila melanogaster genome release 5) are supplied as an additional file in this package.

The data set characterises the expression of 2395 RNA species. This is the differentially expressed, high-confidence subset of BDGP insitu. The starting point for dataset preparation was the published SQL database dump with annotations ( All in situ hybridisations for wild type Drosophila melanogaster embryos were extracted from this source. The reporter construct annotations were not used.

Only high-confidence expression patterns were retained. The gene expression in the BDGP insitu database was annotated by human curators from microscopic images. Depending on the quality of images and staining some expression patterns were easier to discern than others. The curators expressed their confidence in their expression call together with the annotation data of each gene. The filtering criteria for including a probe's exression pattern were that

  1. the final call of the annotators was 'acceptable',

  2. there was no remark about staining intensity (pointing to substandard quality),

  3. the microscopic image was not excluded by quality control,

  4. the annotation was displayed on the database's website,

  5. the probe/staining was not flagged for repeating or for giving up, and

  6. the final word of the annotators (a free text field) did not contain negative remarks like "weak", "nonspecific", "muddy", "poor", "dull", "spillover" or "suspicious" staining; lack of staining penetration; a call to repeat the staining; signs of doubt (e.g. "might", "perhaps", "maybe", "could", "not sure", "not confirmed", "unconvincing", "conflicting", "can't say", "failure", "wrong", "junk"); on camera problems; artefacts or transposons.

  7. there was no annotation with "no staining" to avoid false negatives.

Genes with known ubiquitous expression (including faint-ubiquitous) at any stage were excluded.

Genes for which there was no published probe sequence (approximately 300) were excluded. Most of the RNA in situ hybridisation probes originated from the Drosophila Gold Collection ( and the Drosophila Gene Collection (

Annotated gene expression in each anatomical unit was propagated to all its anatomical subunits. For example "5|Malpighian tubule primordium" expression was propagated to "5|Malpighian tubule main body primordium" and "5|Malpighian tubule tip cell primordium". Only this made both the presence and the absence of staining meaningful. In the original data set gene expression was usually only annotated to the largest unit of expression, but not to its subunits. For instance if there was expression in the whole foregut, there was by necessity also expression in its pharynx subunit. However, in such a case expression in the pharynx was not commonly denoted in the original data set. Consequently some anatomic units had very few expressed genes associated. These genes were those that were exclusively expressed in those anatomical units and in no superior units.


Tomancak, Genome Biol. 2007;8(7):R145



[Package cellOrigins version 0.1.3 Index]