thinRandomGLM {randomGLM} | R Documentation |
Random generalized linear model predictor thinning
Description
This function allows the user to define a "thinned" version of a random generalized linear model predictor by focusing on those features that occur relatively frequently.
Usage
thinRandomGLM(rGLM, threshold)
Arguments
rGLM |
a |
threshold |
integer specifying the minimum of times a feature was selected across the bags in
|
Details
The function "thins out" (reduces) a previously-constructed random generalized linear model predictor by
removing rarely selected features and refitting each (generalized) linear model (GLM).
Each GLM (per bag) is refit using only those
features that occur more than threshold
times across the nBags
number of bags. The
occurrence count excludes interactions (in other words, the threshold will be applied to the first row of
timesSelectedByForwardRegression
).
Value
The function returns a valid randomGLM
object (see randomGLM
for details) that can be
used as input to the predict() method (see predict.randomGLM
). The returned object contains a
copy of the input rGLM
in which the following components were modified:
predictedOOB |
the updated continuous prediction (if |
predictedOOB.response |
In case of a binary outcome, the updated predicted probability of each
outcome
specified by |
featuresInForwardRegression |
features selected by forward selection in each bag. A list with one
component per bag. Each component
is a matrix with |
coefOfForwardRegression |
coefficients of forward regression. A list with one
component per bag. Each component is a vector giving the coefficients of the model determined by forward
selection in the corresponding bag. The order of the coefficients is the same as the order of the terms in
the corresponding component of |
interceptOfForwardRegression |
a vector with one component per bag giving the intercept of the regression model in each bag. |
timesSelectedByForwardRegression |
a matrix of |
models |
the "thinned" regression models for each bag. |
Author(s)
Lin Song, Steve Horvath, Peter Langfelder
References
Lin Song, Peter Langfelder, Steve Horvath: Random generalized linear model: a highly accurate and interpretable ensemble predictor. BMC Bioinformatics (2013)
Examples
## binary outcome prediction
# data generation
data(iris)
# Restrict data to first 100 observations
iris=iris[1:100,]
# Turn Species into a factor
iris$Species = as.factor(as.character(iris$Species))
# Select a training and a test subset of the 100 observations
set.seed(1)
indx = sample(100, 67, replace=FALSE)
xyTrain = iris[indx,]
xyTest = iris[-indx,]
xTrain = xyTrain[, -5]
yTrain = xyTrain[, 5]
xTest = xyTest[, -5]
yTest = xyTest[, 5]
# predict with a small number of bags - normally nBags should be at least 100.
RGLM = randomGLM(
xTrain, yTrain,
nCandidateCovariates=ncol(xTrain),
nBags=30,
keepModels = TRUE, nThreads = 1)
table(RGLM$timesSelectedByForwardRegression[1, ])
# 0 7 23
# 2 1 1
thinnedRGLM = thinRandomGLM(RGLM, threshold=7)
predicted = predict(thinnedRGLM, newdata = xTest, type="class")
predicted = predict(RGLM, newdata = xTest, type="class")