R: fsreg: an automatic outlier detection procedure in linear...

fsreg {fsdaR}

R Documentation

fsreg: an automatic outlier detection procedure in linear regression

Description

An automatic outlier detection procedure in linear regression

Usage


fsreg(x, ...) 

## S3 method for class 'formula'
fsreg(formula, data, subset, weights, na.action,
       model = TRUE, x.ret = FALSE, y.ret = FALSE,
       contrasts = NULL, offset, ...)

## Default S3 method:
fsreg(x, y, bsb, intercept = TRUE, 
        family = c("homo", "hetero", "bayes"),
		method = c("FS", "S", "MM", "LTS", "LMS"),
        monitoring = FALSE, control, trace = FALSE,
        ...)

Arguments

`formula`	a `formula` of the form `y ~ x1 + x2 + ...`.
`data`	data frame from which variables specified in `formula` are to be taken.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`weights`	an optional vector of weights to be used in the fitting process. NOT USED YET.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options`, and is `na.fail` if that is unset. The “factory-fresh” default is `na.omit`. Another possible value is `NULL`, no action. Value `na.exclude` can be useful.
`model`, `x.ret`, `y.ret`	`logical`s indicating if the model frame, the model matrix and the response are to be returned, respectively.
`contrasts`	an optional list. See the `contrasts.arg` of `model.matrix.default`.
`offset`	this can be used to specify an a priori known component to be included in the linear predictor during fitting. An `offset` term can be included in the formula instead or as well, and if both are specified their sum is used.
`family`	family of robust regression models, can be 'homo' for homoscedastic (same variance) regression model, 'hetero' for heteroskedastic regression model or 'bayes' Bayesian linear regression. The default is `family='homo'` for homoscedastic regression model.
`method`	robust regression estimation model, can be 'FS' for Forward search, 'S' for S regression, 'MM' for MM regression, 'LMS' or 'LTS'. The default is `method='FS'` for forward search estimation.
`monitoring`	wheather to perform monitoring for several quantities in each step of the forward search or for series of values of the breakdown point in case of S estimates or for series of values of the efficiency in case of MM estimates. Deafault is `monitoring=FALSE`.
`y`	Response variable. Vector. Response variable, specified as a vector of length n, where n is the number of observations. Each entry in y is the response for the corresponding row of X. Missing values (NA's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.
`x`	Predictor variables. Matrix. Matrix of explanatory variables (also called 'regressors') of dimension n x (p-1) where p denotes the number of explanatory variables including the intercept. Rows of X represent observations, and columns represent variables. By default, there is a constant term in the model, unless you explicitly remove it using input option `intercept=FALSE`, so do not include a column of 1s in X. Missing values (NA's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.
`bsb`	Initial subset - vector of indices. If `bsb=0` (default) then the procedure starts with p units randomly chosen. If bsb is not 0 the search will start with `m0=length(bsb)`.
`intercept`	Indicator for constant term. Scalar. If `intercept=TRUE`, a model with constant term will be fitted (default), else, no constant term will be included.
`control`	A control object (S3) containing estimation options. If the control object is supplied, the parameters from it will be used. If parameters are passed also in the invocation statement, they will override the corresponding elements of the control object.
`trace`	Whether to print intermediate results. Default is `trace=FALSE`.
`...`	potential further arguments passed to lower level functions.

Value

Depending on the input parameters family and method, one of the following objects will be returned:

Author(s)

FSDA team

References

Riani, M., Atkinson A.C., Cerioli A. (2009). Finding an unknown number of multivariate outliers. Journal of the Royal Statistical Society Series B, Vol. 71, pp. 201-221.

Examples

    ## Not run: 

    library(robustbase)
    
    n <- 200
    p <- 3
    
    X <- matrix(data=rnorm(n*p), nrow=n, ncol=p)
    y <- matrix(data=rnorm(n*1), nrow=n, ncol=1)
    (out = fsreg(X, y))

    ## Now we use the formula interface:
    (out1 = fsreg(y~X, control=FSR_control(plot=FALSE)))

    ## Or use the variables in a data frame
    (out2 = fsreg(Y~., data=hbk, control=FSR_control(plot=FALSE)))

    ## let us compare to the LTS solution
    library(robustbase)
    (out3 = ltsReg(Y~., data=hbk))
    
    ## Now compute the model without intercept
    (out4 = fsreg(Y~.-1, data=hbk, control=FSR_control(plot=FALSE)))
    
    ## And compare again with the LTS solution
    (out5 = ltsReg(Y~.-1, data=hbk))

    ## using default (optional arguments)        
    (out6 = fsreg(Y~.-1, data=hbk, control=FSR_control(plot=FALSE, nsamp=1500, h=50)))
    
## End(Not run)

[Package fsdaR version 0.9-0 Index]