extract_metal_binders {protti}R Documentation

Extract metal-binding protein information from UniProt

Description

Information of metal binding proteins is extracted from UniProt data retrieved with fetch_uniprot as well as QuickGO data retrieved with fetch_quickgo.

Usage

extract_metal_binders(
  data_uniprot,
  data_quickgo,
  data_chebi = NULL,
  data_chebi_relation = NULL,
  data_eco = NULL,
  data_eco_relation = NULL,
  show_progress = TRUE
)

Arguments

data_uniprot

a data frame containing at least the ft_binding, cc_cofactor and cc_catalytic_activity columns.

data_quickgo

a data frame containing molecular function gene ontology information for at least the proteins of interest. This data should be obtained by calling fetch_quickgo().

data_chebi

optional, a data frame that can be manually obtained with fetch_chebi(stars = c(2, 3)). It should contain 2 and 3 star entries. If not provided it will be fetched within the function. If the function is run many times it is recommended to provide the data frame to save time.

data_chebi_relation

optional, a data frame that can be manually obtained with fetch_chebi(relation = TRUE). If not provided it will be fetched within the function. If the function is run many times it is recommended to provide the data frame to save time.

data_eco

optional, a data frame that contains evidence and conclusion ontology data that can be obtained by calling fetch_eco(). If not provided it will be fetched within the function. If the function is run many times it is recommended to provide the data frame to save time.

data_eco_relation

optional, a data frame that contains relational evidence and conclusion ontology data that can be obtained by calling fetch_eco(return_relation = TRUE). If not provided it will be fetched within the function. If the function is run many times it is recommended to provide the data frame to save time.

show_progress

a logical value that specifies if progress will be shown (default is TRUE).

Value

A data frame containing information on protein metal binding state. It contains the following columns:

For each protein identifier the data frame contains information on the bound ligand as well as on its position if it is known. Since information about metal ligands can come from multiple sources, additional information (e.g. evidence) is nested in the returned data frame. In order to unnest the relevant information the following steps have to be taken: It is possible that there are multiple IDs in the "most_specific_id" column. This means that one position cannot be uniquely attributed to one specific ligand even with the same ligand_identifier. Apart from the "most_specific_id" column, in which those instances are separated by ",", in other columns the relevant information is separated by "||". Then information should be split based on the source (not the source column, that one can be removed from the data frame). There are certain columns associated with specific sources (e.g. go_term is associated with the "go_term" source). Values of columns not relevant for a certain source should be replaced with NA. Since a most_specific_id can have multiple chebi_ids associated with it we need to unnest the chebi_id column and associated columns in which information is separated by "|". Afterwards evidence and additional information can be unnested by first splitting data for ";;" and then for ";".

Examples


# Create example data

uniprot_ids <- c("P00393", "P06129", "A0A0C5Q309", "A0A0C9VD04")

## UniProt data
data_uniprot <- fetch_uniprot(
  uniprot_ids = uniprot_ids,
  columns = c(
    "ft_binding",
    "cc_cofactor",
    "cc_catalytic_activity"
  )
)

## QuickGO data
data_quickgo <- fetch_quickgo(
  id_annotations = uniprot_ids,
  ontology_annotations = "molecular_function"
)

## ChEBI data (2 and 3 star entries)
data_chebi <- fetch_chebi(stars = c(2, 3))
data_chebi_relation <- fetch_chebi(relation = TRUE)

## ECO data
eco <- fetch_eco()
eco_relation <- fetch_eco(return_relation = TRUE)

# Extract metal binding information
metal_info <- extract_metal_binders(
  data_uniprot = data_uniprot,
  data_quickgo = data_quickgo,
  data_chebi = data_chebi,
  data_chebi_relation = data_chebi_relation,
  data_eco = eco,
  data_eco_relation = eco_relation
)

metal_info


[Package protti version 0.8.0 Index]