readUCSCtable {wrProteo} | R Documentation |
Read annotation files from UCSC
Description
This function allows reading and importing genomic UCSC-annotation data.
Files can be read as default UCSC exprot or as GTF-format.
In the context of proteomics we noticed that sometimes UniProt tables from UCSC are hard to match to identifiers from UniProt Fasta-files, ie many protein-identifiers won't match.
For this reason additional support is given to reading 'Genes and Gene Predictions': Since this table does not include protein-identifiers, a non-redundant list of ENSxxx transcript identifiers
can be exprted as file for an additional stop of conversion, eg using a batch conversion tool at the site of UniProt.
The initial genomic annotation can then be complemented using readUniProtExport
.
Using this more elaborate route, we found higher coverage when trying to add genomic annotation to protein-identifiers to proteomics results with annnotation based on an initial Fasta-file.
Usage
readUCSCtable(
fiName,
exportFileNa = NULL,
gtf = NA,
simplifyCols = c("gene_id", "chr", "start", "end", "strand", "frame"),
silent = FALSE,
debug = FALSE,
callFrom = NULL
)
Arguments
fiName |
(character) name (and path) of file to read |
exportFileNa |
(character) optional file-name to be exported, if |
gtf |
(logical) specify if file |
simplifyCols |
(character) optional list of column-names to be used for simplification (if 6 column-headers are given) : the 1st value will be used to identify the column used as refence to summarize all lines with this ID; for the 2nd (typically chromosome names) will be taken a representative value, for the 3rd (typically gene start site) will be taken the minimum, for the 4th (typically gene end site) will be taken the maximum, for the 5th and 6th a representative values will be reported; |
silent |
(logical) suppress messages |
debug |
(logical) display additional messages for debugging |
callFrom |
(character) allow easier tracking of message(s) produced |
Value
This function returns a matrix, optionally the file 'exportFileNa' may be written
See Also
Examples
path1 <- system.file("extdata", package="wrProteo")
gtfFi <- file.path(path1, "UCSC_hg38_chr11extr.gtf.gz")
# here we'll write the file for UniProt conversion to tempdir() to keep things tidy
expFi <- file.path(tempdir(), "deUcscForUniProt2.txt")
UcscAnnot1 <- readUCSCtable(gtfFi, exportFileNa=expFi)
## results can be further combined with readUniProtExport()
deUniProtFi <- file.path(path1, "deUniProt_hg38chr11extr.tab")
deUniPr1 <- readUniProtExport(deUniProtFi, deUcsc=UcscAnnot1,
targRegion="chr11:1-135,086,622")
deUniPr1[1:5,-5]