smoothz {freqparcoord} | R Documentation |
Smoothing functions.
Description
Routines for k-Nearest Neighbor density and regression estimation, optionally using parallel computation.
Usage
smoothz(z,sf,k,checkna=TRUE,cls=NULL,nchunks=length(cls),scalefirst=FALSE)
smoothzpred(newx,oldx,oldxregest,checkna=TRUE,cls=NULL,nchunks=length(cls))
knnreg(data,k)
knndens(data,k)
Arguments
z |
The data, in data frame or matrix form. In the regression case, the response variable is assumed to be in the last column. |
sf |
Smoothing function (unquoted), |
k |
Number of nearest neighbors. |
nchunks |
Number of chunks to break the computation into. |
newx |
New X data to predict from |
oldx |
X-variable values in the training set. |
oldxregest |
Estimated regression values in the training set. |
checkna |
If TRUE, remove any row having at least one NA value. |
cls |
Cluster to use (see the |
data |
Data to be smoothed. |
scalefirst |
Apply scale to the data before smoothing. |
Details
The smoothed values are calculated at the input data points
(needed in this form for another application). So, for instance, the
i-th value of the output of smoothz
in the regression case is the
estimated regression function at the i-th row of z
.
The density estimates are not mormalized to having total hypervolume equal to 1.0.
In the case of non-null nchunks
, smoothing is done within-chunk
only. The smoothed value at a point will be computed only from its
neighbors in the point's chunk.
The smoothzpred
function applies only to the regression case.
It is assumed that smoothz
has been previously called on
oldx
, yielding regression function estimates oldxregest
at
those points. The smoothzpred
function then finds, for each
point newx[i]
, the closest point oldx[j]
in oldx
, and
uses the corresponding value oldxregest[j]
as the predicted value
at newx[i]
.
Value
Vector of smoothed values, or in the case of smoothzpred
,
vector of predicted Y values for newx
.
Author(s)
Norm Matloff <matloff@cs.ucdavis.edu> and Yingkang Xie <yingkang.xie@gmail.com>
Examples
# programmers and engineers in Silicon Valley, 2000 census, age 25-65
data(prgeng)
pg <- prgeng
pg1 <- pg[pg$age >= 25 & pg$age <= 65,]
estreg <- smoothz(pg1[,c(1,8)],sf=knnreg,k=100)
age <- pg1[,1]
p <- ggplot(data.frame(age,estreg))
p + geom_smooth(aes(x=age,y=estreg))
# peak earnings appear to occur around age 45