npconmode {np} | R Documentation |
Kernel Modal Regression with Mixed Data Types
Description
npconmode
performs kernel modal regression on mixed data,
and finds the
conditional mode given a set of training data, consisting of
explanatory data and dependent data, and possibly evaluation data.
Automatically computes various in sample and out of sample measures of
accuracy.
Usage
npconmode(bws, ...)
## S3 method for class 'formula'
npconmode(bws, data = NULL, newdata = NULL, ...)
## S3 method for class 'call'
npconmode(bws, ...)
## Default S3 method:
npconmode(bws, txdat, tydat, ...)
## S3 method for class 'conbandwidth'
npconmode(bws,
txdat = stop("invoked without training data 'txdat'"),
tydat = stop("invoked without training data 'tydat'"),
exdat,
eydat,
...)
Arguments
bws |
a bandwidth specification. This can be set as a |
... |
additional arguments supplied to specify the bandwidth type,
kernel types, and so on, detailed below.
This is necessary if you specify bws as a |
data |
an optional data frame, list or environment (or object
coercible to a data frame by |
newdata |
An optional data frame in which to look for evaluation data. If omitted, the training data are used. |
txdat |
a |
tydat |
a one (1) dimensional vector of unordered or ordered factors, containing the dependent data. Defaults to the training data used to compute the bandwidth object. |
exdat |
a |
eydat |
a one (1) dimensional numeric or integer vector of the true values
(outcomes) of the dependent variable. By default,
evaluation takes place on the data provided by |
Value
npconmode
returns a conmode
object with the following
components:
conmode |
a vector of type |
condens |
a vector of numeric type containing the modal density estimates at each evaluation point |
xeval |
a data frame of evaluation points |
yeval |
a vector of type |
confusion.matrix |
the confusion matrix or |
CCR.overall |
the overall correct
classification ratio, or |
CCR.byoutcome |
a numeric vector containing the correct
classification ratio by outcome, or |
fit.mcfadden |
the McFadden-Puig-Kerschner performance measure
or |
The functions mode
, and fitted
may be used to
extract the conditional mode estimates, and the conditional density
estimates at the conditional mode, respectively,
from the resulting object. Also, summary
supports
conmode
objects.
Usage Issues
If you are using data of mixed types, then it is advisable to use the
data.frame
function to construct your input data and not
cbind
, since cbind
will typically not work as
intended on mixed data types and will coerce the data to the same
type.
Author(s)
Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca
References
Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.
Hall, P. and J.S. Racine and Q. Li (2004), “Cross-validation and the estimation of conditional probability densities,” Journal of the American Statistical Association, 99, 1015-1026.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
McFadden, D. and C. Puig and D. Kerschner (1977), “Determinants of the long-run demand for electricity,” Proceedings of the American Statistical Association (Business and Economics Section), 109-117.
Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge University Press.
Scott, D.W. (1992), Multivariate Density Estimation. Theory, Practice and Visualization, New York: Wiley.
Silverman, B.W. (1986), Density Estimation, London: Chapman and Hall.
Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.
Examples
## Not run:
# EXAMPLE 1 (INTERFACE=FORMULA): For this example, we use the
# birthweight data taken from the MASS library, and compute a parametric
# logit model and a nonparametric conditional mode model. We then
# compare their confusion matrices and summary measures of
# classification ability.
library("MASS")
data("birthwt")
attach(birthwt)
# Fit a parametric logit model with low (0/1) as the dependent
# variable and age, lwt, and smoke (0/1) as the covariates
# From ?birthwt
# 'low' indicator of birth weight less than 2.5kg
# 'smoke' smoking status during pregnancy
# 'race' mother's race ('1' = white, '2' = black, '3' = other)
# 'ht' history of hypertension
# 'ui' presence of uterine irritability
# 'ftv' number of physician visits during the first trimester
# 'age' mother's age in years
# 'lwt' mother's weight in pounds at last menstrual period
model.logit <- glm(low~factor(smoke)+
factor(race)+
factor(ht)+
factor(ui)+
ordered(ftv)+
age+
lwt,
family=binomial(link=logit))
# Generate the confusion matrix and correct classification ratio
cm <- table(low, ifelse(fitted(model.logit)>0.5, 1, 0))
ccr <- sum(diag(cm))/sum(cm)
# Now do the same with a nonparametric model. Note - this may take a
# few minutes depending on the speed of your computer...
bw <- npcdensbw(formula=factor(low)~factor(smoke)+
factor(race)+
factor(ht)+
factor(ui)+
ordered(ftv)+
age+
lwt)
model.np <- npconmode(bws=bw)
# Compare confusion matrices from the logit and nonparametric model
# Logit
cm
ccr
# Nonparametric
summary(model.np)
detach(birthwt)
# EXAMPLE 1 (INTERFACE=DATA FRAME): For this example, we use the
# birthweight data taken from the MASS library, and compute a parametric
# logit model and a nonparametric conditional mode model. We then
# compare their confusion matrices and summary measures of
# classification ability.
library("MASS")
data("birthwt")
attach(birthwt)
# Fit a parametric logit model with low (0/1) as the dependent
# variable and age, lwt, and smoke (0/1) as the covariates
# From ?birthwt
# 'low' indicator of birth weight less than 2.5kg
# 'smoke' smoking status during pregnancy
# 'race' mother's race ('1' = white, '2' = black, '3' = other)
# 'ht' history of hypertension
# 'ui' presence of uterine irritability
# 'ftv' number of physician visits during the first trimester
# 'age' mother's age in years
# 'lwt' mother's weight in pounds at last menstrual period
model.logit <- glm(low~factor(smoke)+
factor(race)+
factor(ht)+
factor(ui)+
ordered(ftv)+
age+
lwt,
family=binomial(link=logit))
# Generate the confusion matrix and correct classification ratio
cm <- table(low, ifelse(fitted(model.logit)>0.5, 1, 0))
ccr <- sum(diag(cm))/sum(cm)
# Now do the same with a nonparametric model...
X <- data.frame(factor(smoke),
factor(race),
factor(ht),
factor(ui),
ordered(ftv),
age,
lwt)
y <- factor(low)
# Note - this may take a few minutes depending on the speed of your
# computer...
bw <- npcdensbw(xdat=X, ydat=y)
model.np <- npconmode(bws=bw)
# Compare confusion matrices from the logit and nonparametric model
# Logit
cm
ccr
# Nonparametric
summary(model.np)
detach(birthwt)
## End(Not run)