ModiSkeleton {MRPC} | R Documentation |
Infer a graph skeleton (undirected graph)
Description
This function implements the MRPC algorithm in Badsha and Fu (2019) and Badsha et al. (2021) to infers a graph skeleton (i.e., an undirected graph). It is based on the function skeleton from the pcalg
package (Kalisch et al., 2012). Both functions perform marginal and conditional indpenendence tests. However, ModiSkeleton
implements an online false discovery rate (FDR) control method in order to control the overall FDR, whereas skeleton controls only the type I error rate for each individual test. See details below.
Usage
ModiSkeleton(data, suffStat, FDR, alpha, indepTest, labels, p,
method = c("stable", "original", "stable.fast"),
m.max = Inf, fixedGaps = NULL, fixedEdges = NULL,
NAdelete = TRUE, FDRcontrol = c("LOND", "ADDIS", "NONE"),
tau, lambda, verbose = FALSE)
Arguments
Many arguments are similar to those in skeleton and pc in the pcalg
package. Several arguments here are also arguments for the function MRPC.
data |
Data matrix, where the rows are samples and the columns are features (e.g., genetic variants (GVs) and phenotypes). Columns are for GVs, if available, appear before other columns for phenotypes (e.g., gene expression). For example, if there is one GV, then the first column of the data matrix is the GV and the remaining columns are the gene expression data. |
suffStat |
A list of sufficient statistics. When the data is continuous or can be viewed as continuous, this list contains the correlation matrix of the data and the sample size, which are the necessary elements for the conditional independence tests in gaussCItest. When the data is discrete, this list contains the entire dataset. |
FDR |
Desired overall FDR level. |
alpha |
significance level (number in (0,1) for the individual tests. |
indepTest |
Name of the statistical test. It is used to test the independence of x and y given S, where x and y are variables and S is a vector, possibly empty, of variables. The return value of indepTest is the p-value of the test for conditional independence. Different tests may used for different data types. For example, ci.test in the |
labels |
A character vector of names of variables (nodes). These are typically the column names of the data matrix. |
p |
(optional) The number of variables (nodes). Need to be specified if the labels are not provided, in which case the labels are set to 1:p. |
method |
(optional) Character string specifying method. The default, "stable" provides an order-independent skeleton. |
m.max |
(optional) Maximum size of the conditioning sets that are considered in the conditional independence tests. |
fixedGaps |
(optional) A logical matrix of dimension p*p. If entry [x, y], [y, x], or both are TRUE, the edge x—y is removed before starting the algorithm. Therefore, this edge is guaranteed to be absent in the resulting graph. |
fixedEdges |
(optional) A logical matrix of dimension p*p. If entry [x, y], [y, x], or both are TRUE, the edge x—y is never considered for removal. Therefore, this edge is guaranteed to be present in the resulting graph. |
NAdelete |
(optional) If indepTest returns NA and this option is TRUE, the corresponding edge is deleted. If this option is FALSE, the edge is not deleted. |
FDRcontrol |
A character string specifying whether online FDR control should be applied, and if so, what method to use. The two FDR control options are "LOND" (Javanmard and Montanari, 2015) or "ADDIS" (Tian and Ramdas, 2019). If "NONE" is specified, the type I error rate "alpha" will be used for each test. |
tau |
A number between 0 and 1. This value is used to determine if a p-value will be considered for testing. For example, if a p-value is greater than tau then it is discarded and no test will be performed. |
lambda |
A number between 0 and tau. This value is used to determine if a p-value is a candidate for rejection. For example, if a p-value is smaller than lambda then it can be rejected when testing the hypothesis (if the p-value is smaller than alphai). |
verbose |
(optional) If TRUE, detailed output is provided. Default is FALSE for no output details |
Details
The ModiSkeleton function incorporates sequential hypothesis testing to infer the graph skeleton. This function starts with a complete graph (all nodes are connected with undirected edges) and performs a series of marginal and conditional independence tests, removing the corresponding edge if the test is not rejected.
First, all pairs of nodes are tested for marginal independence. If two nodes x and y are judged to be marginally independent at a type I error rate alpha, the edge between them is deleted and the empty set is saved as separation sets S[x, y] and S[y, x]. After all pairs have been tested for marginal independence, some edges may be removed.
Second, nodes (x, y) with an edge are tested for conditional independence given all subsets of the neighboring nodes. If there is any node z such that x and y are conditionally independent given z, the edge between x and y is removed and node z is saved as separation set, sepset, S[x, y] and S[y, x]. The algorithm continues in this way by increasing the size of the conditioning set step by step. The algorithm stops if all adjacency sets in the current graph are smaller than the size of the conditioning set. The result is the skeleton in which every edge is still undirected.
Unlike existing algorithms, which control only the type I error rate for each individual test, MRPC implements the LOND (Level On the Number of Discoveries) method (Javanmard and Montanari, 2015), which is a sequential hypothesis testing procedure and sets value of alpha for each test based on the number of discoveries (i.e., rejections), to control the overall false discovery rate.
Value
An object containing an estimate of the skeleton of the underlying DAG as follow:
call
:A call object: the original function call.
n
:The sample size used to estimate the graph.
max.ord
:The maximum size of the conditioning set used in the conditional independence tests of the first part of the algorithm.
n.edgetests
:The number of conditional independence tests performed by the first part of the algorithm.
sepset
:Separation sets.
pMax
:A square matrix , where the (i, j)th entry contains the maximum p-value of all conditional independence tests for edge i–j.
graph
:Object of class
"graph"
: The undirected or partially directed graph that was estimated.zMin
:Deprecated.
test
:The number of tests that have been performed.
alpha
:The level of significance for the current test.
R
:All of the decisions made from tests that have been performed. A 1 indicates a rejected null hypothesis and 0 represents a null hypothesis that was not rejected.
K
:The total number of rejections.
pval
:A vector of p-values calculated for each test.
normalizer
:The value that ensures the vector gammai sums to one.
exponent
:The exponent of the p-series used to calculate each value of the gammai vector.
alphai
:A vector containing the alpha value calculated for each test.
kappai
:A vector containing the iteration at which each rejected test occurs.
kappai_star
:Each element of this vector is the sum of the Si vector up to the iteration at which each rejection occurs.
Ci
:A vector indicating whether or not a p-value is a candidate for being rejected.
Si
:A vector indicating whether or not a p-value was discarded.
Ci_plus
:Each element of this vector represents the number of times each kappai value was counted when calculating each alphai value.
gammai
:The elements of this vector are the values of the p-series 0.4374901658/(m^(1.6)), where m is the iteration at which each test is performed.
Author(s)
Md Bahadur Badsha (mbbadshar@gmail.com)
References
1. Badsha MB and Fu AQ (2019). Learning causal biological networks with the principle of Mendelian randomization. Frontiers in Genetics, 10:460.
2. Badsha MB, Martin EA and Fu AQ (2021). MRPC: An R package for inference of causal graphs. Frontiers in Genetics, 10:651812.
3. Javanmard A and Montanari A (2015). On Online Control of False Discovery Rate. arXiv:150206197 [statME].
4. Kalisch M, Machler M, Colombo D, Maathuis MH and Buhlmann P (2012). Causal Inference Using Graphical Models with the R Package pcalg. Journal of Statistical Software, 47, 26.
5. Tian J and Ramdas A (2019). ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls. In Advances in Neural Information Processing Systems (pp. 9388-9396).
See Also
MRPC; EdgeOrientation; SimulateData.
Examples
## Not run:
# Model 1 (mediation)
# The 1st column of the data matrix is a genetic variant
# and the remaining columns are gene expression nodes.
data <- simu_data_M1 # load data for model 1
n <- nrow(data) # Number of row
V <- colnames(data) # Column names
# Calculate Pearson correlation
suffStat_C <- list(C = cor(data),
n = n)
# Infer a graph skeleton
Skel.fit <- ModiSkeleton(data,
suffStat = suffStat_C,
FDR = 0.05,
indepTest = 'gaussCItest',
labels = V,
FDRcontrol = 'LOND',
verbose = FALSE)
# Plot the results
plot(Skel.fit@graph,
main ="Estimated Skeleton")
# Other models are available and may be called as follows:
# Model 0
# data <- simu_data_M0
# Model 2
# data <- simu_data_M2
# Model 3
# data <- simu_data_M3
# Model 4
# data <- simu_data_M4
# Model Multiparent
# data <- simu_data_multiparent
# Model Star
# data <- simu_data_starshaped
# Model Layered
# data <- simu_data_layered
## End(Not run)