Skeleton of the MMHC algorithm {Rfast2} | R Documentation |
The skeleton of a Bayesian network learned with the MMHC algorithm
Description
The skeleton of a Bayesian network learned with the MMHC algorithm.
Usage
mmhc.skel(x, method = "pearson", max_k = 3, alpha = 0.05,
ini.stat = NULL, R = NULL, parallel = FALSE)
Arguments
x |
A numerical matrix with the variables. If you have a data.frame (i.e. categorical data) turn them into a matrix
using |
method |
If you have continuous data, this "pearson". If you have categorical data though, this must be "cat". In this case, make sure the minimum value of each variable is zero. The function "g2Test" in the R package Rfast and the relevant functions work that way. |
max_k |
The maximum conditioning set to use in the conditional indepedence test (see Details). Integer, default value is 3. |
alpha |
The significance level (suitable values in (0, 1)) for assessing the p-values. Default value is 0.05. |
ini.stat |
If the initial test statistics (univariate associations) are available, pass them through this parameter. |
R |
If the correlation matrix is available, pass it here. |
parallel |
Set this to TRUE for parallel computations. |
Details
The max_k option: the maximum size of the conditioning set to use in the conditioning independence test.
Larger values provide more accurate results, at the cost of higher computational times. When the sample size is
small (e.g., <50
observations) the max_k parameter should be 3 for example, otherwise the conditional
independence test may not be able to provide reliable results.
Value
A list including:
ini.stat |
The test statistics of the univariate associations. |
ini.pvalue |
The initial p-values univariate associations. |
pvalue |
A matrix with the logarithm of the p-values of the updated associations. This final p-value is the maximum p-value among the two p-values in the end. |
runtime |
The duration of the algorithm. |
ntests |
The number of tests conducted during each k. |
G |
The adjancency matrix. A value of 1 in G[i, j] appears in G[j, i] also, indicating that i and j have an edge between them. |
Author(s)
Michail Tsagris and Stefanos Fafalios.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr and Stefanos Fafalios stefanosfafalios@gmail.com.
References
Tsamardinos, I., Aliferis, C. F. and Statnikov, A. (2003). Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 673-678). ACM.
Brown, L. E., Tsamardinos, I. and Aliferis, C. F. (2004). A novel algorithm for scalable and accurate Bayesian network learning. Medinfo, 711-715.
Tsamardinos I., Brown E.L. and Aliferis F.C. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning 65(1):31-78.
See Also
Examples
# simulate a dataset with continuous data
x <- matrix( rnorm(300 * 30, 1, 100), nrow = 300 )
a <- mmhc.skel(x)