R: Fit a Regression-Scale Model

rsm {marg}

R Documentation

Fit a Regression-Scale Model

Description

Produces an object of class rsm which is a regression-scale model fit of the data.

Usage

rsm(formula = formula(data), family = gaussian, 
    data = sys.frame(sys.parent()), dispersion = NULL, 
    weights = NULL, subset = NULL, na.action = na.fail, 
    offset = NULL, method = "rsm.surv", 
    control = glm.control(maxit=100, trace=FALSE), 
    model = FALSE, x = FALSE, y = TRUE, contrasts = NULL, ...)

Arguments

`formula`	a formula expression as for other linear regression models, of the form `response ~ predictors` where the predictors are separated by suitable operators. See the documentation of `lm` and `formula` for details.
`family`	a `family.rsm` object, i.e. a list of functions and expressions characterizing the error distribution. Families supported are `gaussian`, `student` (Student's t), `extreme` (Gumbel or extreme value), `logistic`, `logWeibull`, `logExponential`, `logRayleigh` and `Huber` (Huber's least favourable). These represent calls to the corresponding generator functions. The calls to `gaussian`, `extreme`, `logistic`, `logWeibull`, `logExponential` and `logRayleigh` can be given without parentheses. The functions `student` and `Huber` may take as argument respectively the degrees of freedom (`df`) and the tuning constant (`k`). Users can construct their own families, as long as they have components compatible with those given in `rsm.distributions`. The demonstration file ‘margdemo.R’ that ships with the package shows how to create a new generator function. The default is `gaussian`.
`data`	an optional data frame in which to interpret the variables occurring in the model formula, or in the `subset` and the `weights` arguments. If this is missing, then the variables in the formula should be on the search list.
`dispersion`	if `NULL`, the scale parameter is taken to be unknown. If known, the numerical value can be passed. The default is `NULL`. Huber's least favourable distribution represents a special case. If `dispersion` is `NULL`, the maximum likelihood estimate is computed, while if `TRUE` the MAD estimate is calculated and the scale parameter fixed to this value in subsequent computations.
`weights`	the optional weights for the fitting criterion. If supplied, the response variable and the covariates are multiplied by the weights in the IRLS algorithm. The length of the `weights` argument must be the same as the number of observations. The weights must be nonnegative and it is strongly recommended that they be strictly positive, since zero weights are ambiguous, compared to use of the `subset` argument.
`subset`	expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default.
`na.action`	a function to filter missing data. This is applied to the model frame after any `subset` argument has been used. The default (with `na.fail`) is to create an error if any missing value is found. A possible alternative is `na.omit`, which deletes observations that contain one or more missing values.
`offset`	this can be used to specify an a priori known component to be included in the linear predictor during fitting. An `offset` term can be included in the formula instead or as well, and if both are specified their sum is used. Defaults to `NULL`
`method`	the fitting method to be used; the default is `rsm.fit`. The method `model.frame` simply returns the model frame.
`control`	a list of iteration and algorithmic constants. See `glm.control` for their names and default values.
`model`	if `TRUE`, the model frame is returned; default is `FALSE`.
`x`	if `TRUE`, the model matrix is returned; default is `FALSE`.
`y`	if `TRUE`, the response variable is returned; default is `TRUE`.
`contrasts`	a list of contrasts to be used for some or all of the factors appearing as variables in the model formula. The names of the list should be the names of the corresponding variables, and the elements should either be contrast-type matrices (matrices with as many rows as levels of the factor and with columns linearly independent of each other and of a column of one's), or else they should be functions that compute such contrast matrices.
`...`	absorbs any additional argument.

Details

The model is fitted using Iteratively Reweighted Least Squares, IRLS for short (Green, 1984, Jorgensen, 1984). The working response and iterative weights are computed using the functions contained in the family.rsm object.

The two workhorses of rsm are rsm.fit and rsm.surv, which expect an X and Y argument rather then a formula. The first function is used for the families student with df < 3 and Huber; the second one, based on the survreg.fit routine for fitting parametric survival models, is used in case of extreme, logistic, logWeibull, logExponential, logRayleigh and student (with df > 2) error distributions. In the presence of a user-defined error distribution the rsm.fit routine is used. The rsm.null function is invoked to fit an empty (null) model.

The details are given in Brazzale (2000, Section 6.3.1).

Value

an object of class rsm is returned which inherits from glm and lm. See rsm.object for details.

The output can be examined by print, summary, rsm.diag.plots and anova. Components can be extracted using fitted, residuals, formula and family. It can be modified using update. It has most of the components of a glm object, with a few more. Use rsm.object for further details.

Note

In case of extreme, logistic, logWeibull, logExponential, logRayleigh and student (with df > 2) error distributions, both methods, rsm.fit (default choice) and rsm.surv, can be used to fit the model. There are, however, examples where one of the two algorithms (most likely the one invoked by rsm.surv) breaks down. If this is the case, try and refit the model with the alternative choice.

The message "negative iterative weights returned!" is returned if some of the iterative weights (q2 component of the fitted rsm object) are negative. These would be used by default by the rsm.diag routine for the definition of residuals and regression diagnostics. In order to avoid missing values (NAs), the default weighting scheme "observed" automatically switches to "score" unless otherwise specified.

References

Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.

Green, P. J. (1984) Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives (with Discussion). J. R. Statist. Soc. B, 46, 149–192.

Jorgensen, B. (1984) The delta algorithm and GLIM. Int. Stat. Rev., 52, 283–300.

Examples

## House Price Data
data(houses)
houses.rsm <- rsm(price ~ ., family = student(5), data = houses)
## model fit including all covariates
houses.rsm <- rsm(price ~ ., family = student(5), data = houses, 
                  method = "rsm.fit", control = glm.control(trace = TRUE))
## prints information about the iterative procedure at each iteration
update(houses.rsm, ~ . - bdroom + offset(7 * bdroom))
## "bdroom" is included as offset variable with fixed (= 7) coefficient

## Sea Level Data
data(venice)
attach(venice)
Year <- 1:51/51
venice.2.rsm <- rsm(sea ~ Year + I(Year^2), family = extreme)
## quadratic model fitted to sea level data
venice.1.rsm <- update(venice.2.rsm, ~. - I(Year^2))
## linear model fit
##
c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11)
c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62)
venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, 
                  family = extreme)
## includes 18.62-year astronomical tidal cycle and 11-year sunspot cycle
venice.11.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11, family = extreme)
venice.19.rsm <- rsm(sea ~ Year + I(Year^2) + c19 + s19, family = extreme)
## includes either astronomical cycle
##
## comparison of linear, quadratic and periodic (11-year, 19-year) models 
plot(year, sea, ylab = "sea level") 
lines(year, fitted(venice.1.rsm))
lines(year, fitted(venice.2.rsm), col="red")
lines(year, fitted(venice.11.rsm), col="blue")
lines(year, fitted(venice.19.rsm), col="green")
##
detach()

## Darwin's Data on Growth Rates of Plants
data(darwin)
darwin.rsm <- rsm(cross - self ~ pot - 1, family  =  student(3), 
                  data = darwin)
## Maximum likelihood estimates
darwin.rsm <- rsm(cross - self ~ pot - 1, family = Huber, data = darwin)
## M-estimates

[Package marg version 1.2-2.1 Index]