R: Identify proteins with a statistically significant...

compute_crosstalk {crosstalkr}

R Documentation

Identify proteins with a statistically significant relationship to user-provided seeds.

Description

compute_crosstalk returns a dataframe of proteins that are significantly associated with user-defined seed proteins. These identified "crosstalkers" can be combined with the user-defined seed proteins to identify functionally relevant subnetworks. Affinity scores for every protein in the network are calculated using a random-walk with repeats (sparseRWR). Significance is determined by comparing these affinity scores to a bootstrapped null distribution (see bootstrap_null). If using non-human PPI from string, refer to the stringdb documentation for how to specify proteins

Usage

compute_crosstalk(
  seed_proteins,
  g = NULL,
  use_ppi = TRUE,
  ppi = "stringdb",
  species = "homo sapiens",
  n = 1000,
  union = FALSE,
  intersection = FALSE,
  gamma = 0.6,
  eps = 1e-10,
  tmax = 1000,
  norm = TRUE,
  set_seed,
  cache = NULL,
  min_score = 700,
  seed_name = NULL,
  ncores = 1,
  significance_level = 0.95,
  p_adjust = "bonferroni",
  agg_int = 100,
  return_g = FALSE
)

Arguments

`seed_proteins`	user defined seed proteins
`g`	igraph network object.
`use_ppi`	bool, should g be a protein-protein interaction network? If false, user must provide an igraph object in `g`
`ppi`	character string describing the ppi to use: currently only "stringdb" and "biogrid" are supported.
`species`	character string describing the species of interest. For a list of supported species, see `supported_species`. Non human species are only compatible with "stringdb"
`n`	number of random walks with repeats to create null distribution
`union`	bool, should we take the union of string db and biogrid to compute the PPI? Only applicable for the human PPI
`intersection`	bool, should we take the intersection of string db and biogrid to compute the PPI? Only applicable for the human PPI
`gamma`	restart probability
`eps`	maximum allowed difference between the computed probabilities at the steady state
`tmax`	the maximum number of iterations for the RWR
`norm`	if True, w is normalized by dividing each value by the column sum.
`set_seed`	integer to set random number seed - for reproducibility
`cache`	A filepath to a folder downloaded files should be stored
`min_score`	minimum connectivity score for each edge in the network.
`seed_name`	Name to give the cached ngull distribution - must be a character string
`ncores`	Number of cores to use - defaults to 1. Significant speedup can be achieved by using multiple cores for computation.
`significance_level`	user-defined signficance level for hypothesis testing
`p_adjust`	adjustment method to correct for multiple hypothesis testing: defaults to "holm". see `p.adjust.methods` for other potential adjustment methods.
`agg_int`	number of runs before we need to aggregate the results - necessary to save memory. set at lower numbers to save even more memory.
`return_g`	bool, should we return the graph used? mostly for internal use

Value

data frame containing affinity score, p-value, for all "crosstalkers" related to a given set of seeds

Examples


#1) easy to use for querying biological networks - n = 10000 is more appropriate for actual analyses
#compute_crosstalk(c("EGFR", "KRAS"), n =10)

#2) Also works for any other kind of graph- just specify g (must be igraph formatted as of now)
g <- igraph::sample_gnp(n = 1000, p = 10/1000)
compute_crosstalk(c(1,3,5,8,10), g = g, use_ppi = FALSE, n = 100)

[Package crosstalkr version 1.0.5 Index]