Backward phase of MMPC {MXM}R Documentation

Backward phase of MMPC

Description

Backward phase of MMPC.

Usage

mmpcbackphase(target, dataset, max_k = 3, threshold = 0.05, test = NULL,
wei = NULL, R = 1) 

Arguments

target

The class variable. Provide either a string, an integer, a numeric value, a vector, a factor, an ordered factor or a Surv object. See also Details.

dataset

The data-set; provide either a data frame or a matrix (columns = variables , rows = samples). Alternatively, provide an ExpressionSet (in which case rows are samples and columns are features, see bioconductor for details).

max_k

The maximum conditioning set to use in the conditional indepedence test (see Details). Integer, default value is 3.

threshold

Threshold (suitable values in (0, 1)) for assessing p-values significance. Default value is 0.05.

test

The conditional independence test to use. Type the test without " ", e.g. type testIndFisher, Not "testIndFisher". Default value is NULL. See also CondIndTests.

wei

A vector of weights to be used for weighted regression. The default value is NULL.

R

The number of permutations, set to 1 by default (no permutations based test). There is a trick to avoind doing all permutations. As soon as the number of times the permuted test statistic is more than the observed test statistic is more than 50 (if threshold = 0.05 and R = 999), the p-value has exceeded the signifiance level (threshold value) and hence the predictor variable is not significant. There is no need to continue do the extra permutations, as a decision has already been made.

Details

For each of the selected variables (dataset) the function performs conditional independence tests where the conditioning sets are formed from the other variables. All possible combinations are tried until the variable becomes non significant. The maximum size of the conditioning set is equal to max_k. This is called in the MMPC when the backward phase is requested.

Value

A list including:

met

A numerical vector of size equal to the number of columns of the dataset.

counter

The number of tests performed.

pvalues

The maximum logged p-value for the association of each predictor variable.

Author(s)

Ioannis Tsamardinos, Michail Tsagris

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr

References

Tsamardinos, Brown and Aliferis (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine learning, 65(1), 31-78.

See Also

MMPC, mmhc.skel, CondIndTests, cv.mmpc

Examples

set.seed(123)

#simulate a dataset with continuous data
dataset <- matrix(runif(500 * 100, 1, 100), ncol = 100)

#define a simulated class variable 
target <- 3 * dataset[, 10] + 2 * dataset[, 100] + 3 * dataset[, 20] + rnorm(500, 0, 5)

# MMPC algorithm 
m1 <- MMPC(target, dataset, max_k = 3, threshold = 0.05, test="testIndFisher");
m2 <- MMPC(target, dataset, max_k = 3, threshold = 0.05, test="testIndFisher", backward = TRUE);
x <- dataset[, m1@selectedVars]
mmpcbackphase(target, x, test = testIndFisher)

[Package MXM version 1.5.5 Index]