learn.structure {bnstruct}  R Documentation 
learn the structure of a network.
Description
Learn the structure (the directed acyclic graph) of a BN
object according to a BNDataset
.
Usage
learn.structure(
bn,
dataset,
algo = "mmhc",
scoring.func = "BDeu",
initial.network = NULL,
alpha = 0.05,
ess = 1,
bootstrap = FALSE,
layering = c(),
max.fanin = num.variables(dataset),
max.fanin.layers = NULL,
max.parents = num.variables(dataset),
max.parents.layers = NULL,
layer.struct = NULL,
cont.nodes = c(),
use.imputed.data = FALSE,
use.cpc = TRUE,
mandatory.edges = NULL,
...
)
## S4 method for signature 'BN,BNDataset'
learn.structure(
bn,
dataset,
algo = "mmhc",
scoring.func = "BDeu",
initial.network = NULL,
alpha = 0.05,
ess = 1,
bootstrap = FALSE,
layering = c(),
max.fanin = num.variables(dataset)  1,
max.fanin.layers = NULL,
max.parents = num.variables(dataset)  1,
max.parents.layers = NULL,
layer.struct = NULL,
cont.nodes = c(),
use.imputed.data = FALSE,
use.cpc = TRUE,
mandatory.edges = NULL,
...
)
Arguments
bn 
a 
dataset 
a 
algo 
the algorithm to use. Currently, one among 
scoring.func 
the scoring function to use. Currently, one among 
initial.network 
network srtructure to be used as starting point for structure search.
Can take different values:
a 
alpha 
confidence threshold (only for 
ess 
Equivalent Sample Size value. 
bootstrap 

layering 
vector containing the layers each node belongs to (only for 
max.fanin 
maximum number of parents for each node (only for 
max.fanin.layers 
matrix of available parents in each layer (only for 
max.parents 
maximum number of parents for each node (for 
max.parents.layers 
matrix of available parents in each layer (only for 
layer.struct 

cont.nodes 
vector containing the index of continuous variables. 
use.imputed.data 

use.cpc 
(when using 
mandatory.edges 
binary matrix, where a 
... 
potential further arguments for method. 
Details
We provide three algorithms in order to learn the structure of the network, that can be chosen with the algo
parameter.
The first is the SilanderMyllym\"aki (sm
)
exact searchandscore algorithm, that performs a complete evaluation of the search space in order to discover
the best network; this algorithm may take a very long time, and can be inapplicable when discovering networks
with more than 25–30 nodes. Even for small networks, users are strongly encouraged to provide
meaningful parameters such as the layering of the nodes, or the maximum number of parents – refer to the
documentation in package manual for more details on the method parameters.
The second method is the constraintbased MaxMin ParentsandChildren (mmpc
), that returns the skeleton of the network.
Given the possible presence of loops, due to the nondirectionality of the edges discovered, no parameter learning
is possible using this algorithm. Also note that in the case of a very dense network and lots of obsevations, the statistical evaluation
of the search space may take a long time. Also for this algorithm there are parameters that may need to be tuned,
mainly the confidence threshold of the statistical pruning. Please refer to the rest of this documentation for their explanation.
The third algorithm is another heuristic, the HillClimbing (hc
). It can start from the complete space of possibilities
(default) or from a reduced subset of possible edges, using the cpc
argument.
The fourth algorithm (and the default one) is the MaxMin HillClimbing heuristic (mmhc
), that performs a statistical
sieving of the search space followed by a greedy evaluation, by combining the MMPC and the HC algorithms.
It is considerably faster than the complete method, at the cost of a (likely)
lower quality. As for MMPC, the computational time depends on the density of the network, the number of observations and
the tuning of the parameters.
The fifth method is the Structural ExpectationMaximization (sem
) algorithm,
for learning a network from a dataset with missing values. It iterates a sequence of ExpectationMaximization (in order to “fill in”
the holes in the dataset) and structure learning from the guessed dataset, until convergence. The structure learning used inside SEM,
due to computational reasons, is MMHC. Convergence of SEM can be controlled with the parameters struct.threshold
and param.threshold
, for the structure and the parameter convergence, respectively.
for learning a network from a dataset with missing values. It iterates a sequence of ExpectationMaximization (in order to “fill in”
the holes in the dataset) and structure learning from the guessed dataset, until convergence. The structure learning used inside SEM,
due to computational reasons, is MMHC. Convergence of SEM can be controlled with the parameters struct.threshold
and param.threshold
, for the structure and the parameter convergence, respectively.
Searchandscore methods also need a scoring function to compute an estimated measure of each configuration of nodes.
We provide three of the most popular scoring functions, BDeu
(BayesianDirichlet equivalent uniform, default),
AIC
(Akaike Information Criterion) and BIC
(Bayesian Information Criterion). The scoring function
can be chosen using the scoring.func
parameter.
Structure learning sets the dag
field of the BN
under study, unless bootstrap or the mmpc
algorithm
are employed. In these cases, given the possible presence of loops, the wpdag
field is set.
In case of missing data, the default behaviour (with no other indication from the user)
is to learn the structure using mmhc
starting from the raw dataset.
Value
new BN
object with DAG.
See Also
learn.network learn.dynamic.network
Examples
## Not run:
dataset < BNDataset("file.header", "file.data")
bn < BN(dataset)
# use MMHC
bn < learn.structure(bn, dataset, alpha=0.05, ess=1, bootstrap=FALSE)
# now use SilanderMyllymaki
layers < layering(bn)
mfl < as.matrix(read.table(header=F,
text='0 1 1 1 1 0 1 1 1 1 0 0 8 7 7 0 0 0 14 6 0 0 0 0 19'))
bn < learn.structure(bn, dataset, algo='sm', max.fanin=3, cont.nodes=c(),
layering=layers, max.fanin.layers=mfl, use.imputed.data=FALSE)
## End(Not run)