| sgd {intervals} | R Documentation | 
Yeast gene model sample data
Description
This data set contains a data frame describing a subset of the chromosome feature data represented in Fall 2007 version of ‘saccharomyces_cerevisiae.gff’, available for download from the Saccharomyces Genome Database (https://www.yeastgenome.org:443/).
Usage
data(sgd)Format
A data frame with 14080 observations on the following 8 variables.
- SGDID
- SGD feature ID. 
- type
- 
Only four feature types have been retatined: "CDS","five_prime_UTR_intron","intron", and"ORF". Note that"ORF"correspond to a whole gene while"CDS", to an exon. S. cerevisae does not, however, have many multi-exonic genes.
- feature_name
- A character vector 
- parent_feature_name
- 
The feature_nameof the a larger element to which the current feature belongs. All retained"CDS"entries, for example, belong to an"ORF"entry.
- chr
- 
The chromosome on which the feature occurs. 
- start
- Feature start base. 
- stop
- Feature stop base. 
- strand
- Is the feature on the Watson or Crick strand? 
Examples
# An example to compute "promoters", defined to be the 500 bases
# upstream from an ORF annotation, provided these bases don't intersect
# another orf. See documentation for the sgd data set for more details
# on the annotation set.
use_chr <- "chr01"
data( sgd )
sgd <- subset( sgd, chr == use_chr )
orf <- Intervals(
                 subset( sgd, type == "ORF", c( "start", "stop" ) ),
                 type = "Z"
                 )
rownames( orf ) <- subset( sgd, type == "ORF" )$feature_name
W <- subset( sgd, type == "ORF", "strand" ) == "W"
promoters_W <- Intervals(
                         cbind( orf[W,1] - 500, orf[W,1] - 1 ),
                         type = "Z"
                         )
promoters_W <- interval_intersection(
                                     promoters_W,
                                     interval_complement( orf )
                                     )
# Many Watson-strand genes have another ORF upstream at a distance of
# less than 500 bp
hist( size( promoters_W ) )
# All CDS entries are completely within their corresponding ORF entry.
cds_W <- Intervals(
                 subset( sgd, type == "CDS" & strand == "W", c( "start", "stop" ) ),
                 type = "Z"
                 )
rownames( cds_W ) <- NULL
interval_intersection( cds_W, interval_complement( orf[W,] ) )