DM.Rpart {HMP}R Documentation

Dirichlet-Multinomial RPart

Description

This function combines recursive partitioning and the Dirichlet-Multinomial distribution to identify homogeneous subgroups of microbiome taxa count data.

Usage

DM.Rpart(data, covars, plot = TRUE, minsplit = 1, minbucket = 1, cp = 0, numCV = 10, 
	numCon = 100, parallel = FALSE, cores = 3, use1SE = FALSE, lowerSE = TRUE)

Arguments

data

A matrix of taxonomic counts(columns) for each sample(rows).

covars

A matrix of covariates(columns) for each sample(rows).

plot

When 'TRUE' a tree plot of the results will be generated.

minsplit

The minimum number of observations to split on, see rpart.control.

minbucket

The minimum number of observations in any terminal node, see rpart.control.

cp

The complexity parameter, see rpart.control.

numCV

The number folds for a k-fold cross validation. A value less than 2 will return the rpart result without any cross validation.

numCon

The number of cross validations to repeat to achieve a consensus solution.

parallel

When this is 'TRUE' it allows for parallel calculation of consensus. Requires the package doParallel.

cores

The number of parallel processes to run if parallel is 'TRUE'.

use1SE

See details.

lowerSE

See details.

Details

There are 3 ways to run this function. The first is setting numCV to less than 2, which will run rpart once using the DM distribution and the specified minsplit, minbucket and cp. This result will not have any kind of branch pruning and the objects returned 'fullTree' and 'bestTree' will be the same.

The second way is setting numCV to 2 or greater (we recommend 10) and setting numCon to less than 2. This will run rpart several times using a k-fold cross validation to prune the tree to its optimal size. This is the best method to use.

The third way is setting both numCV and numCon to 2 or greater (We recommend at least 100 for numCon). This will repeat the second way numCon times and build a consensus solution. This method is ONLY needed for low sample sizes.

When the argument 'use1SE' is 'FALSE', the returned object 'bestTree' is the pruned tree with the lowest MSE. When it is 'TRUE', 'bestTree' is either the biggest pruned tree (lowerSE = FALSE) or the smallest pruned tree (lowerSE = TRUE), that is within 1 standard error of the lowest MSE.

Value

The 3 main things returned are:

fullTree

An rpart object without any pruning.

bestTree

A pruned rpart object based on use1SE and lowerSE's settings.

cpTable

Information about the fullTree rpart object and how it splits.

The other variables returned include surrogate/competing splits, error rates and a plot of the bestTree if plot is TRUE.

Examples

	data(saliva)
	data(throat)
	data(tonsils)
	
	### Create some covariates for our data set
	site <- c(rep("Saliva", nrow(saliva)), rep("Throat", nrow(throat)), 
			rep("Tonsils", nrow(tonsils)))
	covars <- data.frame(Group=site)
	
	### Combine our data into a single object
	data <- rbind(saliva, throat, tonsils)
	
	### For a single rpart tree
	numCV <- 0
	numCon <- 0
	rpartRes <- DM.Rpart(data, covars, numCV=numCV, numCon=numCon)
	
	## Not run: 
		### For a cross validated rpart tree
		numCon <- 0
		rpartRes <- DM.Rpart(data, covars, numCon=numCon)
		
		### For a cross validated rpart tree with consensus
		numCon <- 2 # Note this is set to 2 for speed and should be at least 100
		rpartRes <- DM.Rpart(data, covars, numCon=numCon)
	
## End(Not run)

[Package HMP version 2.0.1 Index]