R: Read protein annotation as exported from UniProt...

readUniProtExport {wrProteo}

R Documentation

Read protein annotation as exported from UniProt batch-conversion

Description

This function allows reading and importing protein-ID conversion results from UniProt. To do so, first copy/paste your query IDs into UniProt 'Retrieve/ID mapping' field called '1. Provide your identifiers' (or upload as file), verify '2. Select options'. In a typical case of 'enst000xxx' IDs you may leave default settings, ie 'Ensemble Transcript' as input and 'UniProt KB' as output. Then, 'Submit' your search and retreive results via 'Download', you need to specify a 'Tab-separated' format ! If you download as 'Compressed' you need to decompress the .gz file before running the function readUCSCtable In addition, a file with UCSC annotation (Ensrnot accessions and chromosomic locations, obtained using readUCSCtable) can be integrated.

Usage

readUniProtExport(
  UniProtFileNa,
  deUcsc = NULL,
  targRegion = NULL,
  useUniPrCol = NULL,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

`UniProtFileNa`	(character) name (and path) of file exported from Uniprot (tabulated text file inlcuding headers)
`deUcsc`	(data.frame) object produced by `readUCSCtable` to be combined with data from `UniProtFileNa`
`targRegion`	(character or list) optional marking of chromosomal locations to be part of a given chromosomal target region, may be given as character like `chr11:1-135,086,622` or as `list` with a first component characterizing the chromosome and a integer-vector with start- and end- sites
`useUniPrCol`	(character) optional declaration which colums from UniProt exported file should be used/imported (default 'EnsID','Entry','Entry.name','Status','Protein.names','Gene.names','Length').
`silent`	(logical) suppress messages
`debug`	(logical) display additional messages for debugging
`callFrom`	(character) allow easier tracking of message(s) produced

Details

In a typicall use case, first chromosomic location annotation is extracted from UCSC for the species of interest and imported to R using readUCSCtable . However, the tables provided by UCSC don't contain Uniprot IDs. Thus, an additional (batch-)conversion step needs to get added. For this reason readUCSCtable allows writing a file with Ensemble transcript IDs which can be converted tu UniProt IDs at the site of UniProt. Then, UniProt annotation (downloaded as tab-separated) can be imported and combined with the genomic annotation using this function.

Value

This function returns a data.frame (with columns $EnsID, $Entry, $Entry.name, $Status, $Protein.names, $Gene.names, $Length; if deUcsc is integrated plus: $chr, $type, $start, $end, $score, $strand, $Ensrnot, $avPos)

Examples

path1 <- system.file("extdata",package="wrProteo")
deUniProtFi <- file.path(path1,"deUniProt_hg38chr11extr.tab")
deUniPr1a <- readUniProtExport(deUniProtFi) 
str(deUniPr1a)

## Workflow starting with UCSC annotation (gtf) files :
gtfFi <- file.path(path1,"UCSC_hg38_chr11extr.gtf.gz")
UcscAnnot1 <- readUCSCtable(gtfFi)
## Results of conversion at UniProt are already available (file "deUniProt_hg38chr11extr.tab")
myTargRegion <- list("chr1", pos=c(198110001,198570000))
myTargRegion2 <-"chr11:1-135,086,622"      # works equally well
deUniPr1 <- readUniProtExport(deUniProtFi,deUcsc=UcscAnnot1,
  targRegion=myTargRegion)
## Now UniProt IDs and genomic locations are both available :
str(deUniPr1)

[Package wrProteo version 1.12.0 Index]