compute_crosstalk {crosstalkr}R Documentation

Identify proteins with a statistically significant relationship to user-provided seeds.

Description

compute_crosstalk returns a dataframe of proteins that are significantly associated with user-defined seed proteins. These identified "crosstalkers" can be combined with the user-defined seed proteins to identify functionally relevant subnetworks. Affinity scores for every protein in the network are calculated using a random-walk with repeats (sparseRWR). Significance is determined by comparing these affinity scores to a bootstrapped null distribution (see bootstrap_null). If using non-human PPI from string, refer to the stringdb documentation for how to specify proteins

Usage

compute_crosstalk(
  seed_proteins,
  g = NULL,
  use_ppi = TRUE,
  ppi = "stringdb",
  species = "homo sapiens",
  n = 1000,
  union = FALSE,
  intersection = FALSE,
  gamma = 0.6,
  eps = 1e-10,
  tmax = 1000,
  norm = TRUE,
  set_seed,
  cache = NULL,
  min_score = 700,
  seed_name = NULL,
  ncores = 1,
  significance_level = 0.95,
  p_adjust = "bonferroni",
  agg_int = 100,
  return_g = FALSE
)

Arguments

seed_proteins

user defined seed proteins

g

igraph network object.

use_ppi

bool, should g be a protein-protein interaction network? If false, user must provide an igraph object in g

ppi

character string describing the ppi to use: currently only "stringdb" and "biogrid" are supported.

species

character string describing the species of interest. For a list of supported species, see supported_species. Non human species are only compatible with "stringdb"

n

number of random walks with repeats to create null distribution

union

bool, should we take the union of string db and biogrid to compute the PPI? Only applicable for the human PPI

intersection

bool, should we take the intersection of string db and biogrid to compute the PPI? Only applicable for the human PPI

gamma

restart probability

eps

maximum allowed difference between the computed probabilities at the steady state

tmax

the maximum number of iterations for the RWR

norm

if True, w is normalized by dividing each value by the column sum.

set_seed

integer to set random number seed - for reproducibility

cache

A filepath to a folder downloaded files should be stored

min_score

minimum connectivity score for each edge in the network.

seed_name

Name to give the cached ngull distribution - must be a character string

ncores

Number of cores to use - defaults to 1. Significant speedup can be achieved by using multiple cores for computation.

significance_level

user-defined signficance level for hypothesis testing

p_adjust

adjustment method to correct for multiple hypothesis testing: defaults to "holm". see p.adjust.methods for other potential adjustment methods.

agg_int

number of runs before we need to aggregate the results - necessary to save memory. set at lower numbers to save even more memory.

return_g

bool, should we return the graph used? mostly for internal use

Value

data frame containing affinity score, p-value, for all "crosstalkers" related to a given set of seeds

Examples


#1) easy to use for querying biological networks - n = 10000 is more appropriate for actual analyses
#compute_crosstalk(c("EGFR", "KRAS"), n =10)

#2) Also works for any other kind of graph- just specify g (must be igraph formatted as of now)
g <- igraph::sample_gnp(n = 1000, p = 10/1000)
compute_crosstalk(c(1,3,5,8,10), g = g, use_ppi = FALSE, n = 100)



[Package crosstalkr version 1.0.5 Index]