quadBoundaryFunc {AppliedPredictiveModeling}R Documentation

Functions for Simulating Data

Description

These functions simulate data that are used in the text.

Usage

quadBoundaryFunc(n)

easyBoundaryFunc(n, intercept = 0, interaction = 2)

Arguments

n

the sample size

intercept

the coefficient for the logistic regression intercept term

interaction

the coefficient for the logistic regression interaction term

Details

The quadBoundaryFunc function creates a class boundary that is a function of both predictors. The probability values are based on a logistic regression model with model equation: -1-2*X1 -0.2*X1^2 + 2*X2^2. The predictors here are multivariate normal with mean (1, 0) and a moderate degree of positive correlation.

Similarly, the easyBoundaryFunc uses a logistic regression model with model equation: intercept -4*X1 + 4*X2 + interaction*X1*X2. The predictors here are multivariate normal with mean (1, 0) and a strong positive correlation.

Value

Both functions return data frames with columns

X1

numeric predictor value

X2

numeric predictor value

prob

numeric value reflecting the true probability of the first class

class

a factor variable with levels 'Class1' and 'Class2'

Author(s)

Max Kuhn

Examples

## in Chapter 11, 'Measuring Performance in Classification Model'
set.seed(975)
training <- quadBoundaryFunc(500)
testing <- quadBoundaryFunc(1000)
 

## in Chapter 20, 'Factors That Can Affect Model Performance'
set.seed(615)
dat <- easyBoundaryFunc(200, interaction = 3, intercept = 3)
dat$X1 <- scale(dat$X1)
dat$X2 <- scale(dat$X2)
dat$Data <- "Original"
dat$prob <- NULL

## in Chapter X, 'An Introduction to Feature Selection'

set.seed(874)
reliefEx3 <- easyBoundaryFunc(500)
reliefEx3$X1 <- scale(reliefEx3$X1)
reliefEx3$X2 <- scale(reliefEx3$X2)
reliefEx3$prob <- NULL


[Package AppliedPredictiveModeling version 1.1-7 Index]