model.diagnostics {ModelMap} | R Documentation |
Model Predictions and Diagnostics
Description
Takes model object and makes predictions, runs model diagnostics, and creates graphs and tables of the results.
Usage
model.diagnostics(model.obj = NULL, qdata.trainfn = NULL, qdata.testfn = NULL,
folder = NULL, MODELfn = NULL, response.name = NULL, unique.rowname = NULL,
diagnostic.flag=NULL, seed = NULL, prediction.type=NULL, MODELpredfn = NULL,
na.action = NULL, v.fold = 10, device.type = NULL, DIAGNOSTICfn = NULL,
res=NULL, jpeg.res = 72, device.width = 7, device.height = 7, units="in",
pointsize=12, cex=par()$cex, req.sens, req.spec, FPC, FNC, quantiles=NULL,
all=TRUE, subset = NULL, weights = NULL, mtry = NULL, controls = NULL,
xtrafo = NULL, ytrafo = NULL, scores = NULL)
Arguments
model.obj |
| |||||||||||||||||||||||||||||||||||
qdata.trainfn |
String. The name (full path or base name with path specified by | |||||||||||||||||||||||||||||||||||
qdata.testfn |
String. The name (full path or base name with path specified by | |||||||||||||||||||||||||||||||||||
folder |
String. The folder used for all output from predictions and/or maps. Do not add ending slash to path string. If | |||||||||||||||||||||||||||||||||||
MODELfn |
String. The file name to use to save the generated model object. If | |||||||||||||||||||||||||||||||||||
response.name |
String. The name of the response variable used to build the model. The | |||||||||||||||||||||||||||||||||||
unique.rowname |
String. The name of the unique identifier used to identify each row in the training data. If | |||||||||||||||||||||||||||||||||||
diagnostic.flag |
String. The name of a column used to identify a subset of rows in the training data or test data to
use for model diagnostics. This column must be either a logical vector ( | |||||||||||||||||||||||||||||||||||
seed |
Integer. The number used to initialize randomization to build RF or SGB models. If you want to produce the same model later, use the same seed. If | |||||||||||||||||||||||||||||||||||
prediction.type |
String. Prediction type. | |||||||||||||||||||||||||||||||||||
MODELpredfn |
String. Model validation. A character string used to construct the output file names for the validation diagnostics, for example the prediction | |||||||||||||||||||||||||||||||||||
na.action |
String. Model validation. Specifies the action to take if there are | |||||||||||||||||||||||||||||||||||
v.fold |
Integer (or logical | |||||||||||||||||||||||||||||||||||
device.type |
String or vector of strings. Model validation. One or more device types for graphical output from model validation diagnostics. Current choices:
| |||||||||||||||||||||||||||||||||||
DIAGNOSTICfn |
String. Model validation. Name used as base to create names for output files from model validation diagnostics. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by | |||||||||||||||||||||||||||||||||||
res |
Integer. Model validation. Pixels per inch for jpeg, png, and tiff plots. The default is 72dpi, good for on screen viewing. For printing, suggested setting is 300dpi. | |||||||||||||||||||||||||||||||||||
jpeg.res |
Integer. Model validation. Deprecated. Ignored unless | |||||||||||||||||||||||||||||||||||
device.width |
Integer. Model validation. The device width for diagnostic plots in inches. | |||||||||||||||||||||||||||||||||||
device.height |
Integer. Model validation. The device height for diagnostic plots in inches. | |||||||||||||||||||||||||||||||||||
units |
Model validation. The units in which | |||||||||||||||||||||||||||||||||||
pointsize |
Integer. Model validation. The default pointsize of plotted text, interpreted as big points (1/72 inch) at | |||||||||||||||||||||||||||||||||||
cex |
Integer. Model validation. The cex for diagnostic plots. | |||||||||||||||||||||||||||||||||||
req.sens |
Numeric. Model validation. The required sensitivity for threshold optimization for binary response model evaluation. | |||||||||||||||||||||||||||||||||||
req.spec |
Numeric. Model validation. The required specificity for threshold optimization for binary response model evaluation. | |||||||||||||||||||||||||||||||||||
FPC |
Numeric. Model validation. The False Positive Cost for threshold optimization for binary response model evaluation. | |||||||||||||||||||||||||||||||||||
FNC |
Numeric. Model validation. The False Negative Cost for threshold optimization for binary response model evaluation. | |||||||||||||||||||||||||||||||||||
quantiles |
Numeric Vector. QRF models. The quantiles to predict. A numeric vector with values between zero and one. If model was built without specifying quantiles, quantile importance can not be calculated, but | |||||||||||||||||||||||||||||||||||
all |
Logical. QRF models. | |||||||||||||||||||||||||||||||||||
subset |
CF models. NOT SUPPORTED. Only needed for | |||||||||||||||||||||||||||||||||||
weights |
CF models. NOT SUPPORTED. Only needed for | |||||||||||||||||||||||||||||||||||
mtry |
Integer. Only needed for | |||||||||||||||||||||||||||||||||||
controls |
CF models. Only needed for | |||||||||||||||||||||||||||||||||||
xtrafo |
CF models. Only needed for | |||||||||||||||||||||||||||||||||||
ytrafo |
CF models. Only needed for | |||||||||||||||||||||||||||||||||||
scores |
CF models. NOT SUPPORTED. Only needed for |
Details
model.diagnostics()
takes model object and makes predictions, runs model diagnostics, and creates graphs and tables of the results.
model.diagnostics()
can be run in a traditional R command mode, where all arguments are specified in the function call. However it can also be used in a full push button mode, where you type in the simple command model.map()
, and GUI pop up windows will ask questions about the type of model, the file locations of the data, etc...
When running model.map()
on non-Windows platforms, file names and folders need to be specified in the argument list, but other pushbutton selections are handled by the select.list()
function, which is platform independent.
Diagnostic predictions are made my one of four methods, and a text file is generated consisting of three columns: Observation ID, observed values and predicted values. If predition.type = "CV")
an additional column indicates which cross-fold each observation fell into. If the models response type is categorical then in addition a column giving the category predicted by majority vote, there are also categories for each possible response category giving the proportion of trees that predicted that category.
A variable importance graph is made. If response.type = "categorical"
, category specific graphs are generated for variable importance. These show how much the model accuracy for each category is affected when the values of each predictor variable is randomly permuted.
The package corrplot
is used to generate a plot of correlation between predictor variables. If there are highly correlated predictor variables, then the variable importances of "RF"
and "QRF"
models need to be interpreted with care, and users may want to consider looking at the conditional variable importances available for "CF"
models produced by the party
package.
If model.type = "RF"
, the OOB error is plotted as a function of number of trees in the model. If response.type = "binary"
or If response.type = "categorical"
category specific graphs are generated for OOB error as a function of number of trees.
If response.type = "binary"
, a summary graph is made using the PresenceAbsence
package and a *.csv
spreadsheets are created of optimized thresholds by several methods with their associated error statistics, and predicted prevalence.
If response.type = "continuous"
a scatterplot of observed vs. predicted is created with a simple linear regression line. The graph is labeled with slope and intercept of this line as well as Pearson's and Spearman's correlation coefficients.
If response.type = "categorical"
, a confusion matrix is generated, that includes erros of ommission and comission, as well as Kappa, Percent Correctly Classified (PCC) and the Multicategorical Area Under the Curve (MAUC) as defined by Hand and Till (2001) and calculated by the package HandTill2001
.
Value
The function will return a dataframe of the row ID, and the Observed and predicted values.
For Binary response models the predicted probability of presence is returned.
For Categorical Response models the predicted category (by majority vote) is returned as well as a column for each category giving the probability of that category. If necessary, make.names
is applied to the categories to create valid column names.
For Continuous response models the predicted value is returned.
If prediction.type = "CV"
the dataframe also includes a column indicating which cross-validation fold each datapoint was in.
Note
Importance currently unavailable for QRF models.
If you are running cross validation diagnostics on a CF model, the model parameters will NOT automatically be passed to model.diagnostics()
. For cross validation, it is the users responsibility to be certain that the CF arguments are the same in model.build()
and model.diagnostics()
.
Also, for some CF model parameters (subset
, weights
, and scores
) ModelMap
only provides OOB and independent test set diagnostics, and does not support cross validation diagnostics.
Author(s)
Elizabeth Freeman and Tracey Frescino
References
Breiman, L. (2001) Random Forests. Machine Learning, 45:5-32.
Elith, J., Leathwick, J. R. and Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology. 77:802-813.
Hand, D. J., & Till, R. J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2), 171-186.
Liaw, A. and Wiener, M. (2002). Classification and Regression by randomForest. R News 2(3), 18–22.
Ridgeway, G., (1999). The state of boosting. Comp. Sci. Stat. 31:172-181
See Also
get.test
, model.build
, model.mapmake
Examples
## Not run:
###########################################################################
############################# Run this set up code: #######################
###########################################################################
# set seed:
seed=38
# Define training and test files:
qdata.trainfn = system.file("extdata", "helpexamples","DATATRAIN.csv", package = "ModelMap")
qdata.testfn = system.file("extdata", "helpexamples","DATATEST.csv", package = "ModelMap")
# Define folder for all output:
folder=getwd()
#identifier for individual training and test data points
unique.rowname="ID"
###########################################################################
############## Pick one of the following sets of definitions: #############
###########################################################################
########## Continuous Response, Continuous Predictors ############
#file name to store model:
MODELfn="RF_Bio_TC"
#predictors:
predList=c("TCB","TCG","TCW")
#define which predictors are categorical:
predFactor=FALSE
# Response name and type:
response.name="BIO"
response.type="continuous"
########## binary Response, Continuous Predictors ############
#file name to store model:
MODELfn="RF_CONIFTYP_TC"
#predictors:
predList=c("TCB","TCG","TCW")
#define which predictors are categorical:
predFactor=FALSE
# Response name and type:
response.name="CONIFTYP"
# This variable is 1 if a conifer or mixed conifer type is present,
# otherwise 0.
response.type="binary"
########## Continuous Response, Categorical Predictors ############
# In this example, NLCD is a categorical predictor.
#
# You must decide what you want to happen if there are categories
# present in the data to be predicted (either the validation/test set
# or in the image file) that were not present in the original training data.
# Choices:
# na.action = "na.omit"
# Any validation datapoint or image pixel with a value for any
# categorical predictor not found in the training data will be
# returned as NA.
# na.action = "na.roughfix"
# Any validation datapoint or image pixel with a value for any
# categorical predictor not found in the training data will have
# the most common category for that predictor substituted,
# and the a prediction will be made.
# You must also let R know which of the predictors are categorical, in other
# words, which ones R needs to treat as factors.
# This vector must be a subset of the predictors given in predList
#file name to store model:
MODELfn="RF_BIO_TCandNLCD"
#predictors:
predList=c("TCB","TCG","TCW","NLCD")
#define which predictors are categorical:
predFactor=c("NLCD")
# Response name and type:
response.name="BIO"
response.type="continuous"
###########################################################################
########################### build model: ##################################
###########################################################################
### create model ###
model.obj = model.build( model.type="RF",
qdata.trainfn=qdata.trainfn,
folder=folder,
unique.rowname=unique.rowname,
MODELfn=MODELfn,
predList=predList,
predFactor=predFactor,
response.name=response.name,
response.type=response.type,
seed=seed,
na.action="na.roughfix"
)
###########################################################################
#### Then Run this code make validation predictions and diagnostics: ######
###########################################################################
### for Out-of-Bag predictions ###
MODELpredfn<-paste(MODELfn,"_OOB",sep="")
PRED.OOB<-model.diagnostics( model.obj=model.obj,
qdata.trainfn=qdata.trainfn,
folder=folder,
unique.rowname=unique.rowname,
# Model Validation Arguments
prediction.type="OOB",
MODELpredfn=MODELpredfn,
device.type=c("default","jpeg","pdf"),
na.action="na.roughfix"
)
PRED.OOB
### for Cross-Validation predictions ###
#MODELpredfn<-paste(MODELfn,"_CV",sep="")
#PRED.CV<-model.diagnostics( model.obj=model.obj,
# qdata.trainfn=qdata.trainfn,
# folder=folder,
# unique.rowname=unique.rowname,
# seed=seed,
# # Model Validation Arguments
# prediction.type="CV",
# MODELpredfn=MODELpredfn,
# device.type=c("default","jpeg","pdf"),
# v.fold=10,
# na.action="na.roughfix"
#)
#PRED.CV
### for Independent Test Set predictions ###
#MODELpredfn<-paste(MODELfn,"_TEST",sep="")
#PRED.TEST<-model.diagnostics( model.obj=model.obj,
# qdata.testfn=qdata.testfn,
# folder=folder,
# unique.rowname=unique.rowname,
# # Model Validation Arguments
# prediction.type="TEST",
# MODELpredfn=MODELpredfn,
# device.type=c("default","jpeg","pdf"),
# na.action="na.roughfix"
#)
#PRED.TEST
)
## End(Not run) # end dontrun