human {canprot}R Documentation

Amino Acid Compositions of Human Proteins


Data for amino acid compositions of proteins and conversion from old to new UniProt IDs.


human_aa is a data frame with 25 columns in the format used for amino acid compositions in CHNOSZ (see thermo):

protein character Identification of protein
organism character Identification of organism
ref character Reference key for source of sequence data
abbrv character Abbreviation or other ID for protein (e.g. gene name)
chains numeric Number of polypeptide chains in the protein
Ala...Tyr numeric Number of each amino acid in the protein

The protein column contains UniProt IDs in the format database|accession-isoform, where database is most often sp (Swiss-Prot) or tr (TrEMBL), and isoform is an optional suffix indicating the isoform of the protein (particularly in the human_additional file).


The amino acid compositions of human proteins are stored in three files under extdata/protein.

On loading the package, the individual data files are read and combined, and the result is assigned to the human_aa object in the human environment.

As an aid for processing datasets that list old (obsolete) UniProt IDs, the corresponding new (current) IDs are are stored in uniprot_updates. These ID mappings have been manually added as needed for individual datasets, and include proteins from humans as well as other organisms. check_IDs performs the conversion of old to new IDs.

See Also

Amino acid compositions of non-human proteins are stored under extdata/aa in directories archaea, bacteria, cow, dog, mouse, rat, and yeast. These files can be loaded in protcomp via the aa_file argument, which is used e.g. in pdat_osmotic_bact.


# The number of proteins
nrow(get("human_aa", human))
# The number of old to new ID mappings
nrow(get("uniprot_updates", human))

[Package canprot version 1.1.0 Index]