model.explore {ModelMap} | R Documentation |
Exploratory data analysis
Description
Graphically explores the relationships between the training data and the predictor rasters.
Usage
model.explore(qdata.trainfn = NULL, folder = NULL, predList = NULL,
predFactor = FALSE, response.name = NULL, response.type = NULL,
response.colors = NULL, unique.rowname = NULL, OUTPUTfn = NULL,
device.type = NULL, allow.default.graphics=FALSE, res=NULL, jpeg.res = 72,
MAXCELL=100000, device.width = NULL, device.height = NULL, units="in",
pointsize=12, cex=1, rastLUTfn = NULL, create.extrapolation.masks = FALSE,
na.value = -9999, col.ramp = rainbow(101, start = 0, end = 0.5),
col.cat = palette()[-1])
Arguments
qdata.trainfn |
String. The name (full path or base name with path specified by | |||||||||||||||||||||||||||||||||||
folder |
String. The folder used for all output from predictions and/or maps. Do not add ending slash to path string. If | |||||||||||||||||||||||||||||||||||
predList |
String. A character vector of the predictor short names used to build the model. These names must match the column names in the training/test data files and the names in column two of the | |||||||||||||||||||||||||||||||||||
predFactor |
String. A character vector of predictor short names of the predictors from | |||||||||||||||||||||||||||||||||||
response.name |
String. The name of the response variable used to build the model. If | |||||||||||||||||||||||||||||||||||
response.type |
String. Response type: | |||||||||||||||||||||||||||||||||||
response.colors |
Data frame. A two column data frame. Column names must be: | |||||||||||||||||||||||||||||||||||
unique.rowname |
String. The name of the unique identifier used to identify each row in the training data. If | |||||||||||||||||||||||||||||||||||
OUTPUTfn |
String. Filename that ouput file names will be based on. | |||||||||||||||||||||||||||||||||||
device.type |
String or vector of strings. Model validation. One or more device types for graphical output from model validation diagnostics. Current choices:
Note that the | |||||||||||||||||||||||||||||||||||
allow.default.graphics |
Logical. Should the default on-screen graphics device be allowed. USE WITH CAUTION! These graphics are complicated and slow to produce. If the on-screen default graphics device is moved or closed before the plot is completed it can crash the entire R session. | |||||||||||||||||||||||||||||||||||
res |
Integer. Model validation. Pixels per inch for jpeg, png, and tiff plots. The default is 72dpi, good for on screen viewing. For printing, suggested setting is 300dpi. | |||||||||||||||||||||||||||||||||||
jpeg.res |
Integer. Graphical output. Deprecated. Ignored unless | |||||||||||||||||||||||||||||||||||
MAXCELL |
Integer. Graphical output. The maximum number of raster cells used to create the graphical output. Rasters larger than this value will be subsampled for the graphical maps and figures. The default value of Note: | |||||||||||||||||||||||||||||||||||
device.width |
Integer. Model validation. The device width for diagnostic plots in inches. | |||||||||||||||||||||||||||||||||||
device.height |
Integer. Model validation. The device height for diagnostic plots in inches. | |||||||||||||||||||||||||||||||||||
units |
Model validation. The units in which | |||||||||||||||||||||||||||||||||||
pointsize |
Integer. Model validation. The default pointsize of plotted text, interpreted as big points (1/72 inch) at | |||||||||||||||||||||||||||||||||||
cex |
Integer. Model validation. The cex for diagnostic plots. | |||||||||||||||||||||||||||||||||||
rastLUTfn |
String. The file name (full path or base name with path specified by Example of comma-delimited file:
| |||||||||||||||||||||||||||||||||||
create.extrapolation.masks |
Logical. If | |||||||||||||||||||||||||||||||||||
na.value |
Value used in rasters to indicate | |||||||||||||||||||||||||||||||||||
col.ramp |
Color ramp to use for continuous predictors | |||||||||||||||||||||||||||||||||||
col.cat |
Vector. Vector of colors to use for categorical predictors. |
Details
The model.explore
function is intended to aid with preliminary data exploration before model building. It includes graphical tools to explore the relationships between the training data (both predictors and responses) as well as the predictor rasters. It uses the corrplot
package to create a correlation plot of the continuous predictor. This can aid in interpreting the model.importance.plot
output from the models, as Random Forest models divide importance between correlated predictors, while Stochastic Gradient Boosting models assing the majority of the importance to the correlated predictor that is used earlies in the model.
The model.explore
function also can aid in identifying if additional training data is needed. For example, the maps of the extrapolation masks for the predictor rasters help spot areas of the map where the pixels lie outside the range of the training data, and therefore any model predictions will be extrapolations, and possibly unreliable. The user can decide to either collect additional training data, or mask out these areas of the final prediction output of model.mapmake
.
To increase speed, the default behavior for large predictor rasters is to create the graphics from subsampled rasters. (Note: for categorical predictors, the full raster is always used to identify all categories found in the map area.) If create.extrapolation.masks=TRUE
, then the full rasters are used for the extrapolation masks, regardless of size of the reasters. This option runs much slower, as large rasters need to be read into R a block at a time.
Value
Function does not return a value, but does create files.
Graphical files are created for each predictor variable, with file type determined by device.type
. In addition, if create.extrapolation.masks
, an extrapolation mask raster is created for each predictor as well as an overall extrapolation mask, with the value 1
for pixels with predictor values within the range of the training data, or categories found in the training data, and the value 0
for pixels outside the range of the training data, categories not found in the training data, or NA value. The overall extrapolation mask has 0
if any of the predictors for that pixel are extrapolated. Note that this option is much slower to run.
Note
The default graphics device is disabled unless allow.default.graphics
is set to TRUE
. These graphics can be slow to produce, and if the on screen graphics device is moved or closed while the graphic is in progress, it can crash R. It is recomended that graphics be written to a file by using jpeg, pdf, etc... device.type
.
Author(s)
Elizabeth Freeman
Examples
## Not run:
###########################################################################
############################# Run this set up code: #######################
###########################################################################
###Define training and test files:
qdata.trainfn = system.file("extdata", "helpexamples","DATATRAIN.csv", package = "ModelMap")
###Define folder for all output:
folder=getwd()
###identifier for individual training and test data points
unique.rowname="ID"
###predictors:
predList=c("TCB","TCG","TCW","NLCD")
###define which predictors are categorical:
predFactor=c("NLCD")
###Create a the filename (including path) for the rast Look up Tables ###
rastLUTfn.2001 <- system.file( "extdata",
"helpexamples",
"LUT_2001.csv",
package="ModelMap")
###Load rast LUT table, and add path to the predictor raster filenames in column 1 ###
rastLUT.2001 <- read.table(rastLUTfn.2001,header=FALSE,sep=",",stringsAsFactors=FALSE)
for(i in 1:nrow(rastLUT.2001)){
rastLUT.2001[i,1] <- system.file("extdata",
"helpexamples",
rastLUT.2001[i,1],
package="ModelMap")
}
#################Continuous Response###################
###Response name and type:
response.name="BIO"
response.type="continuous"
###file name to store model:
OUTPUTfn="BIO_TCandNLCD.img"
###run model.explore
model.explore( qdata.trainfn=qdata.trainfn,
folder=folder,
predList=predList,
predFactor=predFactor,
response.name=response.name,
response.type=response.type,
unique.rowname=unique.rowname,
OUTPUTfn=OUTPUTfn,
device.type="jpeg",
jpeg.res=144,
# Raster arguments
rastLUTfn=rastLUT.2001,
na.value=-9999,
# colors for continuous predictors
col.ramp=rainbow(101,start=0,end=.5),
# colors for categorical predictors
col.cat=c("wheat1","springgreen2","darkolivegreen4",
"darkolivegreen2","yellow","thistle2",
"brown2","brown4")
)
## End(Not run) # end dontrun