compute_crosstalk {crosstalkr} | R Documentation |
Identify proteins with a statistically significant relationship to user-provided seeds.
Description
compute_crosstalk
returns a dataframe of proteins that are significantly
associated with user-defined seed proteins. These identified "crosstalkers"
can be combined with the user-defined seed proteins to identify functionally
relevant subnetworks. Affinity scores for every protein in the network are
calculated using a random-walk with repeats (sparseRWR
). Significance is
determined by comparing these affinity scores to a bootstrapped null distribution
(see bootstrap_null
). If using non-human PPI from string, refer to the stringdb documentation
for how to specify proteins
Usage
compute_crosstalk(
seed_proteins,
g = NULL,
use_ppi = TRUE,
ppi = "stringdb",
species = "homo sapiens",
n = 1000,
union = FALSE,
intersection = FALSE,
gamma = 0.6,
eps = 1e-10,
tmax = 1000,
norm = TRUE,
set_seed,
cache = NULL,
min_score = 700,
seed_name = NULL,
ncores = 1,
significance_level = 0.95,
p_adjust = "bonferroni",
agg_int = 100,
return_g = FALSE
)
Arguments
seed_proteins |
user defined seed proteins |
g |
igraph network object. |
use_ppi |
bool, should g be a protein-protein interaction network? If
false, user must provide an igraph object in |
ppi |
character string describing the ppi to use: currently only "stringdb" and "biogrid" are supported. |
species |
character string describing the species of interest.
For a list of supported species, see |
n |
number of random walks with repeats to create null distribution |
union |
bool, should we take the union of string db and biogrid to compute the PPI? Only applicable for the human PPI |
intersection |
bool, should we take the intersection of string db and biogrid to compute the PPI? Only applicable for the human PPI |
gamma |
restart probability |
eps |
maximum allowed difference between the computed probabilities at the steady state |
tmax |
the maximum number of iterations for the RWR |
norm |
if True, w is normalized by dividing each value by the column sum. |
set_seed |
integer to set random number seed - for reproducibility |
cache |
A filepath to a folder downloaded files should be stored |
min_score |
minimum connectivity score for each edge in the network. |
seed_name |
Name to give the cached ngull distribution - must be a character string |
ncores |
Number of cores to use - defaults to 1. Significant speedup can be achieved by using multiple cores for computation. |
significance_level |
user-defined signficance level for hypothesis testing |
p_adjust |
adjustment method to correct for multiple hypothesis testing:
defaults to "holm". see |
agg_int |
number of runs before we need to aggregate the results - necessary to save memory. set at lower numbers to save even more memory. |
return_g |
bool, should we return the graph used? mostly for internal use |
Value
data frame containing affinity score, p-value, for all "crosstalkers" related to a given set of seeds
Examples
#1) easy to use for querying biological networks - n = 10000 is more appropriate for actual analyses
#compute_crosstalk(c("EGFR", "KRAS"), n =10)
#2) Also works for any other kind of graph- just specify g (must be igraph formatted as of now)
g <- igraph::sample_gnp(n = 1000, p = 10/1000)
compute_crosstalk(c(1,3,5,8,10), g = g, use_ppi = FALSE, n = 100)