R: Mapping the tree clustering solution to a known source of...

treesource {C443}

R Documentation

Mapping the tree clustering solution to a known source of variation underlying the forest

Description

A function that can be used to get insight into a clusterforest solution, in the case that there are known sources of variation underlying the forest. These known sources of variation must be included in the clusterforest object (and thus must be defined when running the clusterforest function) In case of a categorical covariate, it visualizes the number of trees from each value of the covariate that belong to each cluster. In case of a continuous covariate, it returns the mean and standard deviation of the covariate in each cluster.

Usage

treesource(clusterforest, solution)

Arguments

`clusterforest`	The clusterforest object, indluding the treecov attribute.
`solution`	The solution

Value

`multiplot`	In case of categorical covariate, for each value of the covariate, a bar plot with the number of trees that belong to each cluster
`heatmap`	In case of a categorical covariate, a heatmap with for each value of the covariate, the number of trees that belong to each cluster
`clustermeans`	In case of a continuous covariate, the mean of the covariate in each cluster
`clusterstds`	In case of a continuous covariate, the standard deviation of the covariate in each cluster

Examples

require(rpart)
data_Amphet <-drugs[,c ("Amphet","Age", "Gender", "Edu", "Neuro", "Extr", "Open", "Agree",
"Consc", "Impul","Sensat")]
data_cocaine <-drugs[,c ("Coke","Age", "Gender", "Edu", "Neuro", "Extr", "Open", "Agree",
                         "Consc", "Impul","Sensat")]


#Function to draw a bootstrap sample from a dataset
DrawBoots <- function(dataset, i){
set.seed(2394 + i)
Boot <- dataset[sample(1:nrow(dataset), size = nrow(dataset), replace = TRUE),]
return(Boot)
}

#Function to grow a tree using rpart on a dataset
GrowTree <- function(x,y,BootsSample, minsplit = 40, minbucket = 20, maxdepth =3){

 controlrpart <- rpart.control(minsplit = minsplit, minbucket = minbucket, maxdepth = maxdepth,
 maxsurrogate = 0, maxcompete = 0)
 tree <- rpart(as.formula(paste(noquote(paste(y, "~")), noquote(paste(x, collapse="+")))),
  data = BootsSample, control = controlrpart)
 return(tree)
}

#Draw bootstrap samples and grow trees
BootsA<- lapply(1:5, function(k) DrawBoots(data_Amphet,k))
BootsC<- lapply(1:5, function(k) DrawBoots(data_cocaine,k))
Boots = c(BootsA,BootsC)

TreesA <- lapply(1:5, function (i) GrowTree(x=c ("Age", "Gender", "Edu", "Neuro",
"Extr", "Open", "Agree","Consc", "Impul","Sensat"), y="Amphet", BootsA[[i]] ))
TreesC <- lapply(1:5, function (i) GrowTree(x=c ( "Age", "Gender", "Edu", "Neuro",
"Extr", "Open", "Agree", "Consc", "Impul","Sensat"), y="Coke", BootsC[[i]] ))
Trees=c(TreesA,TreesC)

#Cluster the trees
ClusterForest<- clusterforest(observeddata=drugs,treedata=Boots,trees=Trees,m=1,
fromclus=2, toclus=2, treecov=rep(c("Amphet","Coke"),each=5), sameobs=FALSE, no_cores=2)

#Link cluster result to known source of variation
treesource(ClusterForest, 2)

[Package C443 version 3.4.0 Index]