markrank {Corbi}R Documentation

MarkRank

Description

MarkRank is a novel proposed network-based model, which can identify the cooperative biomarkers for heterogeneous complex disease diagnoses.

Usage

markrank(
  dataset,
  label,
  adj_matrix,
  alpha = 0.8,
  lambda = 0.2,
  eps = 1e-10,
  E_value = NULL,
  trace = TRUE,
  d = Inf,
  Given_NET2 = NULL
)

Arguments

dataset

The microarray expression matrix of related disease. Each row represents a sample and each column represents a gene.

label

The 0-1 binary phenotype vector of dataset samples. The size of label must accord with the sample number in dataset.

adj_matrix

The 0-1 binary adjacent matrix of a connected biological network. Here the node set should be the same order as the gene set in expression matrix.

alpha

The convex combination coefficient of network effect and prior information vector E_value. The range of alpha is in [0,1]. A larger alpha will lay more emphasis on the network information. The default value is 0.8.

lambda

In the random walk-based iteration, matrix A1 reflects the stucture information of the biological network, whereas matrix A2 reflects the cooperative effect of gene combinations. Parameter lambda is the convex combination coefficient of two network effects. The range of lambda is in [0,1]. A larger lambda will lay more emphasis on the A1. The default value is 0.2.

eps

The stop criteria for the iterative solution method. The default value is 1e-10.

E_value

A vector containing the prior information about the importance of nodes. Default is the absolute Pearson correlation coefficient (PCC).

trace

Locaical variable indicated whether tracing information on the progress of the gene cooperation network construction is produced.

d

Threshold for simplifying the G_2 computation. Only the gene pairs whose shortest distances in PPI network are less than d participate in the G_2 computation. The default value is Inf.

Given_NET2

Whether a computed cooperation network is given for tuning parameter. See Details for a more specific description.

Details

MarkRank is a network-based biomarker identification method to prioritize disease genes by integrating multi-source information including the biological network, e.g protein-protein interaction (PPI) network, the prior information about related diseases, and the discriminative power of cooperative gene combinations. MarkRank shows that explicit modeling of gene cooperative effects can greatly improve biomarker identification for complex disease, especially for diseases with high heterogeneity.

MarkRank algorithm contains mainly two steps: 1) The construction of gene cooperation network G_2 and 2) a random walk based iteration procedure. The following descriptions will help the users to using markrank more convenient:

1) As for the construction of the gene cooperation network, we suggest the user to set trace=TRUE to output the G_2 computation process. The G_2 construction step finished if the output number is identical to the gene number of the input expression matrix. The parameter d introduced the structure information of used biological network to facilitate the construction of G_2, only the gene pairs whose shortest distances in network are less than d participate the G_2 computation. We suggest d=Inf, the default value, to fully use the information of expression matrix. If the user given a preset d, the distance matrix of input network dis will be returned.

2) MarkRank uses a random-walk based iteration procedure to score each gene. The detailed formula is:

score = alpha*[lambda*A1 + (1-lambda)*A2]*score + (1-alpha)*E_value.

The users could set an appropriate parameter settings in their pracitical application. Our suggested value is alpha=0.8 and lambda=0.2. The model input parameter combinations and iteration steps will be returned in output components initial_pars and steps, respectively. Because the iteration step is separate with the cooperation network construction, the user can use the parameter Given_NET2 to tune the model parameters. In detail, the user could set

Given_NET2 = result$NET2

in markrank input to avoid the repeated computation of G_2, where the object result is the returned variable of markrank function.

3) The final MarkRank score for each gene is in output score. The users could sort this result and use the top ranked genes for further analysis.

Value

This function will return a list with the following components:

score

The vector of final MarkRank scores for each gene.

steps

The final iteration steps in random walk based scoring procedure.

NET2

The weighted adjacent matrix of gene cooperation network.

initial_pars

The initial/input parameter values used in MarkRank.

dis

The pairwise distance matrix of input network. This variable will be Null if input d=Inf.

References

Duanchen Sun, Xianwen Ren, Eszter Ari, Tamas Korcsmaros, Peter Csermely, Ling-Yun Wu. Discovering cooperative biomarkers for heterogeneous complex disease diagnoses. Briefings in Bioinformatics, 20(1), 89–101, 2019.


[Package Corbi version 0.6-2 Index]