testRF {BAGofT} R Documentation

Testing random forests

Description

testRF specifies a random forest as the classifier to test. It returns a function that can be taken as the input of ‘testModel’.

Usage

testRF(formula, ntree = 500, mtry = NULL, maxnodes = NULL)


Arguments

 formula an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to test. ntree number of trees to grow. The default is 500. mtry number of variables randomly sampled as candidates at each split. The default value is sqrt(p) where p is the number of covariates. maxnodes maximum number of terminal nodes trees in the forest can have.

References

Zhang, Ding and Yang (2021) "Is a Classification Procedure Good Enough?-A Goodness-of-Fit Assessment Tool for Classification Learning" arXiv preprint arXiv:1911.03063v2 (2021).

Examples

## Not run:
###################################################
# Generate a sample dataset.
###################################################
# set the random seed
set.seed(20)
# set the number of observations
n <- 200
# set the number of covariates
p <- 20

# generate covariates data
Xdat <- matrix(runif((n*p), -5,5), nrow = n, ncol = p)
colnames(Xdat) <- paste("x", c(1:p), sep = "")

# generate random coefficients
betaVec <- rnorm(6)
# calculate the linear predictor data
lindat <-  3 * (Xdat[,1] < 2 & Xdat[,1] > -2) + -3 * (Xdat[,1] > 2 | Xdat[,1] < -2) +
0.5 * (Xdat[,2] + Xdat[, 3] + Xdat[,4] + Xdat[, 5])
# calculate the probabilities
pdat <- 1/(1 + exp(-lindat))

# generate the response data
ydat <- sapply(pdat, function(x) stats :: rbinom(1, 1, x))

# generate the dataset
dat <- data.frame(y = ydat, Xdat)

###################################################
# Obtain the testing result
###################################################

# 50 percent training set
testRes1 <- BAGofT(testModel = testRF(formula = y ~.),
data = dat,
ne = n*0.5,
nsplits = 20,
nsim = 40)
# 75 percent training set
testRes2 <- BAGofT(testModel = testRF(formula = y ~.),
data = dat,
ne = n*0.75,
nsplits = 20,
nsim = 40)
# 90 percent training set
testRes3 <- BAGofT(testModel = testRF(formula = y ~.),
data = dat,
ne = n*0.9,
nsplits = 20,
nsim = 40)

# print the testing result.
print(c(testRes1$p.value, testRes2$p.value, testRes3\$p.value))

## End(Not run)


[Package BAGofT version 1.0.0 Index]