pyinit {pyinit} | R Documentation |
PY (Pena-Yohai) initial estimates for S-estimates of regression
Description
Computes the PY initial estimates for S-estimates of regression.
Usage
pyinit(
x,
y,
intercept = TRUE,
delta = 0.5,
cc,
maxit = 10,
psc_keep,
resid_keep_method = c("threshold", "proportion"),
resid_keep_prop,
resid_keep_thresh,
eps = 1e-08,
mscale_maxit = 200,
mscale_tol = eps,
mscale_rho_fun = c("bisquare", "huber", "gauss")
)
Arguments
x |
a matrix with the data, each observation in a row. |
y |
the response vector. |
intercept |
logical, should an intercept be included in the model? Defaults to
|
delta , cc |
parameters for the M-scale estimator equation. If |
maxit |
the maximum number of iterations to perform. |
psc_keep |
proportion of observations to keep based on PSCs. |
resid_keep_method |
how to clean the data based on large residuals.
If |
resid_keep_prop , resid_keep_thresh |
see parameter
|
eps |
the relative tolerance for convergence. Defaults to |
mscale_maxit |
maximum number of iterations allowed for the M-scale
algorithm. Defaults to |
mscale_tol |
convergence threshold for the m-scale |
mscale_rho_fun |
A string containing the name of the rho
function to use for the M-scale. Valid options
are |
Value
coefficients |
numeric matrix with coefficient vectors in columns. These
are regression estimators based on "cleaned" subsets of the data. The
M-scales of the corresponding residuals are returned in the entry
|
objective |
vector of values of the M-scale estimate of the residuals
associated with each vector of regression coefficients in the columns of
|
References
Pena, D., & Yohai, V.. (1999). A Fast Procedure for Outlier Diagnostics in Large Regression Problems. Journal of the American Statistical Association, 94(446), 434-445. <doi:10.2307/2670164>
Examples
# generate a simple synthetic data set for a linear regression model
# with true regression coefficients all equal to one "(1, 1, 1, 1, 1)"
set.seed(123)
x <- matrix(rnorm(100*4), 100, 4)
y <- rnorm(100) + rowSums(x) + 1
# add masked outliers
a <- svd(var(x))$v[,4]
x <- rbind(x, t(outer(a, rnorm(20, mean=4, sd=1))))
y <- c(y, rnorm(20, mean=-2, sd=.2))
# these outliers are difficult to find
plot(lm(y~x), ask=FALSE)
# use pyinit to obtain estimated regression coefficients
tmp <- pyinit(x=x, y=y, resid_keep_method='proportion', psc_keep = .5, resid_keep_prop=.5)
# the vector of regression coefficients with smallest residuals scale
# is returned in the first column of the "coefficients" element
tmp$coefficients[,1]
# compare that with the LS estimator on the clean data
coef(lm(y~x, subset=1:100))
# compare it with the LS estimator on the full data
coef(lm(y~x))