predict.boostmtree {boostmtree} | R Documentation |
Prediction for Boosted multivariate trees for longitudinal data.
Description
Obtain predicted values. Also returns test-set performance if the test data contains y-outcomes.
Usage
## S3 method for class 'boostmtree'
predict(object,
x,
tm,
id,
y,
M,
eps = 1e-5,
useCVflag = FALSE,
...)
Arguments
object |
A boosting object of class |
x |
Data frame (or matrix) containing test set x-values. Rows
must be duplicated to match the number of time points for an
individual. If missing, the training x values are used and
|
tm |
Time values for each test set individual with one entry for
each row of |
id |
Unique subject identifier, one entry for each row in
|
y |
Test set y-values, with one entry for each row in |
M |
Fixed value for the boosting step number. Leave this empty to determine the optimized value obtained by minimizing test-set error. |
eps |
Tolerance value used for determining the optimal |
useCVflag |
Should the predicted value be based on the estimate derived from oob sample? |
... |
Further arguments passed to or from other methods. |
Details
The predicted time profile and performance values are obtained for test data from the boosted object grown on the training data.
R-side parallel processing is implemented by replacing the R function
lapply
with mclapply
found in the parallel
package. You can set the number of cores accessed by
mclapply
by issuing the command options(mc.cores =
x)
, where x
is the number of cores. As an example, issuing
the following options command uses all available cores:
options(mc.cores=detectCores())
However, this can create high RAM usage, especially when using
function partialPlot
which calls the predict
function.
Note that all performance values (for example prediction error) are standardized by the overall y-standard deviation. Thus, reported RMSE (root-mean-squared-error) is actually standardized RMSE. Values are reported at the optimal stopping time.
Value
An object of class (boostmtree, predict)
, which is a list with the
following components:
boost.obj |
The original boosting object. |
x |
The test x-values, but with only one row per individual (i.e. duplicated rows are removed). |
time |
List with each component containing the time points for a given test individual. |
id |
Sorted subject identifier. |
y |
List containing the test y-values. |
Y |
y-values, in the list-format, where nominal or ordinal Response is converted into the binary response. |
family |
Family of |
ymean |
Overall mean of y-values for all individuals. If |
ysd |
Overall standard deviation of y-values for all individuals. If |
xvar.names |
X-variable names. |
K |
Number of terminal nodes. |
n |
Total number of subjects. |
ni |
Number of repeated measures for each subject. |
n.Q |
Number of class labels for non-continuous response. |
Q_set |
Class labels for the non-continuous response. |
y.unq |
Unique y values for the non-continous response. |
nu |
Boosting regularization parameter. |
D |
Design matrix for each subject. |
df.D |
Number of columns of |
time.unq |
Vector of the unique time points. |
baselearner |
List of length M containing the base learners. |
gamma |
List of length M, with each component containing the boosted tree fitted values. |
membership |
List of length M, with each component containing the terminal node membership for a given boosting iteration. |
mu |
Estimated mean profile at the optimized |
Prob_class |
For family == "Ordinal", this provides individual probabilty rather than cumulative probabilty. |
muhat |
Extrapolated mean profile to all unique time points
evaluated at the the optimized |
Prob_hat_class |
Extrapolated |
err.rate |
Test set standardized l1-error and RMSE. |
rmse |
Test set standardized RMSE at the optimized |
Mopt |
The optimized |
Author(s)
Hemant Ishwaran, Amol Pande and Udaya B. Kogalur
References
Pande A., Li L., Rajeswaran J., Ehrlinger J., Kogalur U.B., Blackstone E.H., Ishwaran H. (2017). Boosted multivariate trees for longitudinal data, Machine Learning, 106(2): 277–305.
See Also
plot.boostmtree
,
print.boostmtree
Examples
## Not run:
##------------------------------------------------------------
## Synthetic example (Response is continuous)
##
## High correlation, quadratic time with quadratic interaction
## largish number of noisy variables
##
## Illustrates how modified gradient improves performance
## also compares performance to ideal and well specified linear models
##----------------------------------------------------------------------------
## simulate the data
## simulation 2: main effects (x1, x3, x4), quad-time-interaction (x2)
dtaO <- simLong(n = 100, ntest = 100, model = 2, family = "Continuous", q = 25)
## save the data as both a list and data frame
dtaL <- dtaO$dtaL
dta <- dtaO$dta
## get the training data
trn <- dtaO$trn
## save formulas for linear model comparisons
f.true <- dtaO$f.true
f.linr <- "y~g( x1+x2+x3+x4+x1*time+x2*time+x3*time+x4*time )"
## modified tree gradient (default)
o.1 <- boostmtree(dtaL$features[trn, ], dtaL$time[trn], dtaL$id[trn],dtaL$y[trn],
family = "Continuous",M = 350)
p.1 <- predict(o.1, dtaL$features[-trn, ], dtaL$time[-trn], dtaL$id[-trn], dtaL$y[-trn])
## non-modified tree gradient (nmtg)
o.2 <- boostmtree(dtaL$features[trn, ], dtaL$time[trn], dtaL$id[trn], dtaL$y[trn],
family = "Continuous",M = 350, mod.grad = FALSE)
p.2 <- predict(o.2, dtaL$features[-trn, ], dtaL$time[-trn], dtaL$id[-trn], dtaL$y[-trn])
## set rho = 0
o.3 <- boostmtree(dtaL$features[trn, ], dtaL$time[trn], dtaL$id[trn], dtaL$y[trn],
family = "Continuous",M = 350, rho = 0)
p.3 <- predict(o.3, dtaL$features[-trn, ], dtaL$time[-trn], dtaL$id[-trn], dtaL$y[-trn])
##rmse values compared to generalized least squares (GLS)
##for true model and well specified linear models (LM)
cat("true LM :", boostmtree:::gls.rmse(f.true,dta,trn),"\n")
cat("well specified LM :", boostmtree:::gls.rmse(f.linr,dta,trn),"\n")
cat("boostmtree :", p.1$rmse,"\n")
cat("boostmtree (nmtg):", p.2$rmse,"\n")
cat("boostmtree (rho=0):", p.3$rmse,"\n")
##predicted value plots
plot(p.1)
plot(p.2)
plot(p.3)
##------------------------------------------------------------
## Synthetic example (Response is binary)
##
## High correlation, quadratic time with quadratic interaction
## largish number of noisy variables
##----------------------------------------------------------------------------
## simulate the data
## simulation 2: main effects (x1, x3, x4), quad-time-interaction (x2)
dtaO <- simLong(n = 100, ntest = 100, model = 2, family = "Binary", q = 25)
## save the data as both a list and data frame
dtaL <- dtaO$dtaL
dta <- dtaO$dta
## get the training data
trn <- dtaO$trn
## save formulas for linear model comparisons
f.true <- dtaO$f.true
f.linr <- "y~g( x1+x2+x3+x4+x1*time+x2*time+x3*time+x4*time )"
## modified tree gradient (default)
o.1 <- boostmtree(dtaL$features[trn, ], dtaL$time[trn], dtaL$id[trn],dtaL$y[trn],
family = "Binary",M = 350)
p.1 <- predict(o.1, dtaL$features[-trn, ], dtaL$time[-trn], dtaL$id[-trn], dtaL$y[-trn])
## End(Not run)