sgd {intervals} | R Documentation |
Yeast gene model sample data
Description
This data set contains a data frame describing a subset of the chromosome feature data represented in Fall 2007 version of ‘saccharomyces_cerevisiae.gff’, available for download from the Saccharomyces Genome Database (https://www.yeastgenome.org:443/).
Usage
data(sgd)
Format
A data frame with 14080 observations on the following 8 variables.
SGDID
SGD feature ID.
type
-
Only four feature types have been retatined:
"CDS"
,"five_prime_UTR_intron"
,"intron"
, and"ORF"
. Note that"ORF"
correspond to a whole gene while"CDS"
, to an exon. S. cerevisae does not, however, have many multi-exonic genes. feature_name
A character vector
parent_feature_name
-
The
feature_name
of the a larger element to which the current feature belongs. All retained"CDS"
entries, for example, belong to an"ORF"
entry. chr
-
The chromosome on which the feature occurs.
start
Feature start base.
stop
Feature stop base.
strand
Is the feature on the Watson or Crick strand?
Examples
# An example to compute "promoters", defined to be the 500 bases
# upstream from an ORF annotation, provided these bases don't intersect
# another orf. See documentation for the sgd data set for more details
# on the annotation set.
use_chr <- "chr01"
data( sgd )
sgd <- subset( sgd, chr == use_chr )
orf <- Intervals(
subset( sgd, type == "ORF", c( "start", "stop" ) ),
type = "Z"
)
rownames( orf ) <- subset( sgd, type == "ORF" )$feature_name
W <- subset( sgd, type == "ORF", "strand" ) == "W"
promoters_W <- Intervals(
cbind( orf[W,1] - 500, orf[W,1] - 1 ),
type = "Z"
)
promoters_W <- interval_intersection(
promoters_W,
interval_complement( orf )
)
# Many Watson-strand genes have another ORF upstream at a distance of
# less than 500 bp
hist( size( promoters_W ) )
# All CDS entries are completely within their corresponding ORF entry.
cds_W <- Intervals(
subset( sgd, type == "CDS" & strand == "W", c( "start", "stop" ) ),
type = "Z"
)
rownames( cds_W ) <- NULL
interval_intersection( cds_W, interval_complement( orf[W,] ) )