Calculates amino acid chemical properties for sequence data


aminoAcidProperties calculates amino acid sequence physicochemical properties, including length, hydrophobicity, bulkiness, polarity, aliphatic index, net charge, acidic residue content, basic residue content, and aromatic residue content.


  property = c("length", "gravy", "bulk", "aliphatic", "polarity", "charge", "basic",
    "acidic", "aromatic"),
  seq = "junction",
  nt = TRUE,
  trim = FALSE,
  label = NULL,



data.frame containing sequence data.


vector strings specifying the properties to be calculated. Defaults to calculating all defined properties.


character name of the column containing input sequences.


boolean, TRUE if the sequences (or sequence) are DNA and will be translated.


if TRUE remove the first and last codon/amino acids from each sequence before calculating properties. If FALSE do not modify input sequences.


name of sequence region to add as prefix to output column names.


additional named arguments to pass to the functions gravy, bulk, aliphatic, polar or charge.


For all properties except for length, non-informative positions are excluded, where non-informative is defined as any character in c("X", "-", ".", "*").

The scores for gravy, bulkiness and polarity are calculated as simple averages of the scores for each informative positions. The basic, acid and aromatic indices are calculated as the fraction of informative positions falling into the given category.

The aliphatic index is calculated using the Ikai, 1980 method.

The net charge is calculated using the method of Moore, 1985, excluding the N-terminus and C-terminus charges, and normalizing by the number of informative positions. The default pH for the calculation is 7.4.

The following data sources were used for the default property scores:


A modified data data.frame with the following columns:

Where * is the value from label or the name specified for seq if label=NULL.


See Also

See countPatterns for counting the occurance of specific amino acid subsequences. See gravy, bulk, aliphatic, polar and charge for functions that calculate the included properties individually.


# Subset example data
db <- ExampleDb[c(1,10,100), c("sequence_id", "junction")]

# Calculate default amino acid properties from DNA sequences
aminoAcidProperties(db, seq="junction")
# Calculate default amino acid properties from amino acid sequences
# Use a custom output column prefix
db$junction_aa <- translateDNA(db$junction)
aminoAcidProperties(db, seq="junction_aa", label="junction", nt=FALSE)

# Use the Grantham, 1974 side chain volume scores from the seqinr package
# Set pH=7.0 for the charge calculation
# Calculate only average volume and charge
# Remove the head and tail amino acids from the junction, thus making it the CDR3
x <- aaindex[["GRAR740103"]]$I
# Rename the score vector to use single-letter codes
names(x) <- translateStrings(names(x), ABBREV_AA)
# Calculate properties
aminoAcidProperties(db, property=c("bulk", "charge"), seq="junction", 
                    trim=TRUE, label="cdr3", bulkiness=x, pH=7.0)

