R: Mean square prediction error

mod_error {mosaicModel}

R Documentation

Mean square prediction error

Description

Compares model predictions to the actual value of the response variable. To do this, testing data must be provided with both the input variables and the corresponding response variable. The measure calculated for a quantitative response variable is the mean square prediction error (MSPE). For categorical response variables, an analog of MSPE can be calculated (see details) but by default, a mean log-likelihood (mean per case) is computed instead.

Usage

mod_error(model, testdata, error_type = c("default", "mse", "sse", "mad",
  "LL", "mLL", "dev", "class_error"))

Arguments

model

The model whose prediction error is to be estimated.

testdata

A data frame giving both model inputs and the actual value of the response variable. If no testing data is provided, the training data will be used and a warning issued.

error_type

The measure of error you are interested in. By default, this is mean-square error for regression models and log-likelihood for classifiers. The choices are:

"mse" – mean square error
"sse" – sum of square errors
"mad" – mean absolute deviation
"LL" – log-likelihood
"mLL" – mean log-likehood (per case in the testing data)
"dev" – deviance. (Plus a constant, which is often zero. The constant is fixed for a given testing data set, regardless of the model. So differences between deviances of two models are correct.)
"class_error" – classification error rate.

Details

When the response variable is categorical, the model (called a 'classifier' in such situations) must be capable of computing probabilities for each output rather than just a bare category. This is true for many commonly encountered classifier model architectures.

The analog of the mean squared error for classifiers is the mean of (1-p)^2, where p is the probability assigned by the model to the actual output. This is a rough approximation to the log-likelihood. By default, the log-likelihood will be calculated, but for pedagogical reasons you may prefer (1-p)^2, in which case set error_type = "mse". Classifiers can assign a probability of zero to the actual output, in which case the log-likelihood is -Inf. The "mse" error type avoids this.

Examples

mod <- lm(mpg ~ hp + wt, data = mtcars)
mod_error(mod) # In-sample prediction error.
## Not run: 
classifier <- rpart::rpart(Species ~ ., data = iris)
mod_error(classifier)
mod_error(classifier, error_type = "LL") 
# More typically
inds <- sample(1:nrow(iris), size = 100)
Training <- iris[inds, ]
Testing  <- iris[ - inds, ]
classifier <- rpart::rpart(Species ~ ., data = Training)
# This may well assign zero probability to events that appeared in the
# Testing data 
mod_error(classifier, testdata = Testing)
mod_error(classifier, testdata = Testing, error_type = "mse")

## End(Not run)

[Package mosaicModel version 0.3.0 Index]