mod_error {mosaicModel} | R Documentation |
Mean square prediction error
Description
Compares model predictions to the actual value of the response variable. To do this, testing data must be provided with both the input variables and the corresponding response variable. The measure calculated for a quantitative response variable is the mean square prediction error (MSPE). For categorical response variables, an analog of MSPE can be calculated (see details) but by default, a mean log-likelihood (mean per case) is computed instead.
Usage
mod_error(model, testdata, error_type = c("default", "mse", "sse", "mad",
"LL", "mLL", "dev", "class_error"))
Arguments
model |
The model whose prediction error is to be estimated. |
testdata |
A data frame giving both model inputs and the actual value of the response variable. If no testing data is provided, the training data will be used and a warning issued. |
error_type |
The measure of error you are interested in. By default, this is mean-square error for regression models and log-likelihood for classifiers. The choices are:
|
Details
When the response variable is categorical, the model (called a 'classifier' in such situations) must be capable of computing probabilities for each output rather than just a bare category. This is true for many commonly encountered classifier model architectures.
The analog of the mean squared error for classifiers is the mean of (1-p)^2, where p is the
probability assigned by the model to the actual output. This is a rough approximation
to the log-likelihood. By default, the log-likelihood will be calculated, but for pedagogical
reasons you may prefer (1-p)^2, in which case set error_type = "mse"
. Classifiers can assign a probability
of zero to the actual output, in which case the log-likelihood is -Inf
. The "mse"
error type avoids this.
Examples
mod <- lm(mpg ~ hp + wt, data = mtcars)
mod_error(mod) # In-sample prediction error.
## Not run:
classifier <- rpart::rpart(Species ~ ., data = iris)
mod_error(classifier)
mod_error(classifier, error_type = "LL")
# More typically
inds <- sample(1:nrow(iris), size = 100)
Training <- iris[inds, ]
Testing <- iris[ - inds, ]
classifier <- rpart::rpart(Species ~ ., data = Training)
# This may well assign zero probability to events that appeared in the
# Testing data
mod_error(classifier, testdata = Testing)
mod_error(classifier, testdata = Testing, error_type = "mse")
## End(Not run)