R: Amino acid compositions of human proteins

human {canprot}

R Documentation

Amino acid compositions of human proteins

Description

Amino acid compositions of human proteins derived from UniProt.

Format

human.aa is a data frame with 25 columns in the format used for amino acid compositions in CHNOSZ (see thermo):

`protein`	character	Identification of protein
`organism`	character	Identification of organism
`ref`	character	Reference key for source of sequence data
`abbrv`	character	Abbreviation or other ID for protein (e.g. gene name)
`chains`	numeric	Number of polypeptide chains in the protein
`Ala`...`Tyr`	numeric	Number of each amino acid in the protein

The protein column contains UniProt IDs in the format database|accession-isoform, where database is most often ‘⁠sp⁠’ (Swiss-Prot) or ‘⁠tr⁠’ (TrEMBL), and isoform is an optional suffix indicating the isoform of the protein (particularly in the human.additional file).

Details

The amino acid compositions of human proteins are stored in three files under extdata/protein.

human.base.rds contains amino acid compositions of canonical isoforms of manually reviewed proteins in the UniProt reference human proteome (computed from sequences in UP000005640_9606.fasta.gz, dated 2016-04-03).
human.additional.rds contains amino acid compositions of additional proteins (UP000005640_9606_additional.fasta.gz) including isoforms and unreviewed sequences. In version 0.1.5, this file was trimmed to include only those proteins that are used in any of the datasets in the package.
human.extra.csv contains amino acid compositions of other (“extra”) proteins used in a dataset but not listed in one of the files above. These proteins may include obsolete, unreviewed, or newer additions to the UniProt database. Most, but not all, sequences here are HUMAN (see the organism column and the ref column for the reference keys).

On loading the package, the individual data files are read and combined, and the result is assigned to the human.aa object in the canprot environment.

Examples

# The number of proteins
nrow(get("human.aa", canprot))

[Package canprot version 2.0.0 Index]

Amino acid compositions of human proteins

Description

Format

Details

See Also

Examples