R: Calculate chemical metrics for proteins

metrics {canprot}

R Documentation

Calculate chemical metrics for proteins

Description

Calculate chemical metrics for proteins from their amino acid compositions.

Usage

  Zc(AAcomp, ...)
  nO2(AAcomp, basis = "QEC", ...)
  nH2O(AAcomp, basis = "QEC", terminal_H2O = 0)
  GRAVY(AAcomp, ...)
  pI(AAcomp, terminal_H2O = 1, ...)
  MW(AAcomp, terminal_H2O = 0, ...)
  pMW(AAcomp, terminal_H2O = 1, ...)
  V0(AAcomp, terminal_H2O = 0, ...)
  pV0(AAcomp, terminal_H2O = 1, ...)
  V0g(AAcomp, ...)
  Density(AAcomp, ...)
  S0(AAcomp, terminal_H2O = 0, ...)
  pS0(AAcomp, terminal_H2O = 1, ...)
  S0g(AAcomp, ...)
  SV(AAcomp, ...)
  Zcg(AAcomp, ...)
  nH2Og(AAcomp, ...)
  nO2g(AAcomp, ...)
  HC(AAcomp, ...)
  NC(AAcomp, ...)
  OC(AAcomp, ...)
  SC(AAcomp, ...)
  nC(AAcomp, ...)
  pnC(AAcomp, ...)
  plength(AAcomp, ...)
  Cost(AAcomp, ...)
  RespiratoryCost(AAcomp, ...)
  FermentativeCost(AAcomp, ...)
  B20Cost(AAcomp, ...)
  Y20Cost(AAcomp, ...)
  H11Cost(AAcomp, ...)
  cplab

Arguments

`AAcomp`	data frame, amino acid compositions
`...`	ignored additional arguments
`basis`	character, set of basis species
`terminal_H2O`	numeric, number of pairs of terminal groups

Details

Columns in AAcomp should be named with the three-letter abbreviations for the amino acids. Case-insensitive matching matching of the abbreviations is used; e.g., ‘⁠Ala⁠’, ‘⁠ALA⁠’, ‘⁠ala⁠’ all refer to alanine.

Metrics are normalized per amino acid residue except for Zc, pI, Density, plength, and other functions starting with p (for protein). The contribution of protein terminal groups (-H and -OH) to residue-normalized metrics is turned off by default. Set terminal_H2O to 1 (or to the number of polypeptide chains, if greater than one) to include their contribution.

The metrics are described below:

Zc

Average oxidation state of carbon (Z_C) (Dick, 2014). This metric is independent of the choice of basis species. Note that Z_C is normalized by number of carbon atoms, not by number of residues.

nO2

Stoichiometric oxidation state (n_O₂ per residue). The available basis species are:

‘⁠QEC⁠’ - glutamine, glutamic acid, cysteine, H₂O, O₂ (Dick et al., 2020)
‘⁠QCa⁠’ - glutamine, cysteine, acetic acid, H₂O, O₂

nH2O

Stoichiometric hydration state (n_H₂O per residue). The basis species also affect this calculation.

GRAVY

Grand average of hydropathy. Values of the hydropathy index for individual amino acids are from Kyte and Doolittle (1982).

pI

Isoelectric point. The net charge for each ionizable group was pre-calculated from pH 0 to 14 at intervals of 0.01. The isoelectric point is found as the pH where the sum of charges of all groups in the protein is closest to zero. The pK values for the terminal groups and sidechains are taken from Bjellqvist et al. (1993) and Bjellqvist et al. (1994); note that the calculation does not implement position-specific adjustments described in the latter paper. The number of N- and C-terminal groups is taken from terminal_H2O.

MW

Molecular weight.

pMW

Molecular weight per protein.

V0

Standard molal volume. The values are derived from group contributions of amino acid sidechains and protein backbones (Dick et al., 2006).

pV0

Standard molal volume per protein.

V0g

Specific volume (reciprocal density).

Density

Density (MW / V0).

S0

Standard molal entropy. The values are derived from group contributions of amino acid sidechains and protein backbones (Dick et al., 2006).

pS0

Standard molal entropy per protein.

S0g

Specific entropy.

SV

Entropy density.

Zcg

Carbon oxidation state per gram.

nO2g

Stoichiometric oxidation state per gram.

nH2Og

Stoichiometric hydration state per gram.

HC

H/C ratio (not counting terminal -H and -OH groups).

NC

N/C ratio.

OC

O/C ratio (not counting terminal -H and -OH groups).

SC

S/C ratio.

nC

Number of carbon atoms per residue.

pnC

Number of carbon atoms per protein.

plength

Protein length (number of amino acid residues).

Cost

Metabolic cost (Akashi and Gojobori, 2002).

RespiratoryCost

Respiratory cost (Wagner, 2005).

FermentativeCost

Fermentative cost (Wagner, 2005).

B20Cost

Biosynthetic cost in bacteria (Zhang et al., 2018).

Y20Cost

Biosynthetic cost in yeast (Zhang et al., 2018).

H11Cost

Biosynthetic cost in humans (Zhang et al., 2018).

... is provided to permit get or do.call constructions with the same arguments for all metrics. For instance, a terminal_H2O argument can be suppled to either Zc or nH2O, but it only has an effect on the latter.

cplab is a list of formatted labels for each of the chemical metrics listed here. A check in the code ensures that the names of the functions for calculating metrics and the names for labels listed cplab are identical.

References

Akashi H, Gojobori T. 2002. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proceedings of the National Academy of Sciences 99(6): 3695–3700. doi:10.1073/pnas.062526999

Bjellqvist B, Hughes GJ, Pasquali C, Paquet N, Ravier F, Sanchez J-C, Frutiger S, Hochstrasser D. 1993. The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis 14: 1023–1031. doi:10.1002/elps.11501401163

Bjellqvist B, Basse B, Olsen E, Celis JE. 1994. Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis 15: 529–539. doi:10.1002/elps.1150150171

Dick JM, LaRowe DE, Helgeson HC. 2006. Temperature, pressure, and electrochemical constraints on protein speciation: Group additivity calculation of the standard molal thermodynamic properties of ionized unfolded proteins. Biogeosciences 3(3): 311–336. doi:10.5194/bg-3-311-2006

Dick JM. 2014. Average oxidation state of carbon in proteins. J. R. Soc. Interface 11: 20131095. doi:10.1098/rsif.2013.1095

Dick JM, Yu M, Tan J. 2020. Uncovering chemical signatures of salinity gradients through compositional analysis of protein sequences. Biogeosciences 17: 6145–6162. doi:10.5194/bg-17-6145-2020

Kyte J, Doolittle RF. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157: 105–132. doi:10.1016/0022-2836(82)90515-0

Wagner A. 2005. Energy constraints on the evolution of gene expression. Molecular Biology and Evolution 22(6): 1365–1374. doi:10.1093/molbev/msi126

Zhang H, Wang Y, Li J, Chen H, He X, Zhang H, Liang H, Lu J. 2018. Biosynthetic energy cost for amino acids decreases in cancer evolution. Nature Communications 9(1): 4124. doi:10.1038/s41467-018-06461-1

Examples

# Amino acid composition of a tripeptide (Gly-Ala-Gly)
aa <- data.frame(Ala = 1, Gly = 2)
# Calculate Zc, nH2O, and length
Zc(aa)
nH2O(aa)
plength(aa)

# Make a plot with formatted labels
plot(Zc(aa), nH2O(aa), xlab = cplab$Zc, ylab = cplab$nH2O)

[Package canprot version 2.0.0 Index]