find_peptide_in_structure {protti}R Documentation

Finds peptide positions in a PDB structure based on positional matching

Description

Finds peptide positions in a PDB structure. Often positions of peptides in UniProt and a PDB structure are different due to different lengths of structures. This function maps a peptide based on its UniProt positions onto a PDB structure. This method is superior to sequence alignment of the peptide to the PDB structure sequence, since it can also match the peptide if there are truncations or mismatches. This function also provides an easy way to check if a peptide is present in a PDB structure.

Usage

find_peptide_in_structure(
  peptide_data,
  peptide,
  start,
  end,
  uniprot_id,
  pdb_data = NULL,
  retain_columns = NULL
)

Arguments

peptide_data

a data frame containing at least the input columns to this function.

peptide

a character column in the peptide_data data frame that contains the sequence or any other unique identifier for the peptide that should be found.

start

a numeric column in the peptide_data data frame that contains start positions of peptides.

end

a numeric column in the peptide_data data frame that contains end positions of peptides.

uniprot_id

a character column in the peptide_data data frame that contains UniProt identifiers that correspond to the peptides.

pdb_data

optional, a data frame containing data obtained with fetch_pdb(). If not provided, information is fetched automatically. If this function should be run multiple times it is faster to fetch the information once and provide it to the function. If provided, make sure that the column names are identical to the ones that would be obtained by calling fetch_pdb().

retain_columns

a vector indicating if certain columns should be retained from the input data frame. Default is not retaining additional columns retain_columns = NULL. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector).

Value

A data frame that contains peptide positions in the corresponding PDB structures. If a peptide is not found in any structure or no structure is associated with the protein, the data frame contains NAs values for the output columns. The data frame contains the following and additional columns:

Examples


# Create example data
peptide_data <- data.frame(
  uniprot_id = c("P0A8T7", "P0A8T7", "P60906"),
  peptide_sequence = c(
    "SGIVSFGKETKGKRRLVITPVDGSDPYEEMIPKWRQLNV",
    "NVFEGERVER",
    "AIGEVTDVVEKE"
  ),
  start = c(1160, 1197, 55),
  end = c(1198, 1206, 66)
)

# Find peptides in protein structure
peptide_in_structure <- find_peptide_in_structure(
  peptide_data = peptide_data,
  peptide = peptide_sequence,
  start = start,
  end = end,
  uniprot_id = uniprot_id
)

head(peptide_in_structure, n = 10)


[Package protti version 0.9.0 Index]