readUniProtExport {wrProteo} | R Documentation |
Read protein annotation as exported from UniProt batch-conversion
Description
This function allows reading and importing protein-ID conversion results from UniProt.
To do so, first copy/paste your query IDs into UniProt 'Retrieve/ID mapping' field called '1. Provide your identifiers' (or upload as file), verify '2. Select options'.
In a typical case of 'enst000xxx' IDs you may leave default settings, ie 'Ensemble Transcript' as input and 'UniProt KB' as output. Then, 'Submit' your search and retreive results via
'Download', you need to specify a 'Tab-separated' format ! If you download as 'Compressed' you need to decompress the .gz file before running the function readUCSCtable
In addition, a file with UCSC annotation (Ensrnot accessions and chromosomic locations, obtained using readUCSCtable
) can be integrated.
Usage
readUniProtExport(
UniProtFileNa,
deUcsc = NULL,
targRegion = NULL,
useUniPrCol = NULL,
silent = FALSE,
debug = FALSE,
callFrom = NULL
)
Arguments
UniProtFileNa |
(character) name (and path) of file exported from Uniprot (tabulated text file inlcuding headers) |
deUcsc |
(data.frame) object produced by |
targRegion |
(character or list) optional marking of chromosomal locations to be part of a given chromosomal target region,
may be given as character like |
useUniPrCol |
(character) optional declaration which colums from UniProt exported file should be used/imported (default 'EnsID','Entry','Entry.name','Status','Protein.names','Gene.names','Length'). |
silent |
(logical) suppress messages |
debug |
(logical) display additional messages for debugging |
callFrom |
(character) allow easier tracking of message(s) produced |
Details
In a typicall use case, first chromosomic location annotation is extracted from UCSC for the species of interest and imported to R using readUCSCtable
.
However, the tables provided by UCSC don't contain Uniprot IDs. Thus, an additional (batch-)conversion step needs to get added.
For this reason readUCSCtable
allows writing a file with Ensemble transcript IDs which can be converted tu UniProt IDs at the site of UniProt.
Then, UniProt annotation (downloaded as tab-separated) can be imported and combined with the genomic annotation using this function.
Value
This function returns a data.frame (with columns $EnsID, $Entry, $Entry.name, $Status, $Protein.names, $Gene.names, $Length; if deUcsc
is integrated plus: $chr, $type, $start, $end, $score, $strand, $Ensrnot, $avPos)
See Also
Examples
path1 <- system.file("extdata",package="wrProteo")
deUniProtFi <- file.path(path1,"deUniProt_hg38chr11extr.tab")
deUniPr1a <- readUniProtExport(deUniProtFi)
str(deUniPr1a)
## Workflow starting with UCSC annotation (gtf) files :
gtfFi <- file.path(path1,"UCSC_hg38_chr11extr.gtf.gz")
UcscAnnot1 <- readUCSCtable(gtfFi)
## Results of conversion at UniProt are already available (file "deUniProt_hg38chr11extr.tab")
myTargRegion <- list("chr1", pos=c(198110001,198570000))
myTargRegion2 <-"chr11:1-135,086,622" # works equally well
deUniPr1 <- readUniProtExport(deUniProtFi,deUcsc=UcscAnnot1,
targRegion=myTargRegion)
## Now UniProt IDs and genomic locations are both available :
str(deUniPr1)