R: Linear Regression Model

regress {mStats}

R Documentation

Linear Regression Model

Description

regress() produces summary of the model with coefficients and 95% Confident Intervals.

`predict.regress` a S3 method for predict to generate statistics related to the prediction of the linear model using the output from the regress function of the mStats.

`plot.regress` is a S3 method for plot() to create graphs for checking diagnostics of linear model using the output from the regress function of the mStats.

`ladder` converts a variable into a normally distributed one.

`hettest` performs the Breusch-Pagan test for heteroskedasticity. It presents evidence against the null hypothesis that t=0 in Var(e)=sigma^2 exp(zt). The formula are based on the bptest function in lmtest package.

`linkTest` determines whether a model in R is 'well specified' using the STATA's linkTest.

Usage

regress(model, vce = FALSE, digits = 5)

## S3 method for class 'regress'
predict(object, ...)

## S3 method for class 'regress'
plot(x, ...)

ladder(data, var)

hettest(regress, studentize = FALSE)

linkTest(model, vce = FALSE, digits = 5)

Arguments

`model`	glm or lm model
`vce`	if `TRUE`, robust standard errors are calculated.
`digits`	specify rounding of numbers. See `round`.
`object`	a model object for which prediction is desired.
`...`	additional arguments affecting the predictions produced.
`x`	the coordinates of points in the plot. Alternatively, a single plotting structure, function or any R object with a `plot` method can be provided.
`data`	dataset
`var`	variable name
`regress`	output from `regress`
`studentize`	logical. If set to `TRUE` Koenker's studentized version of the test statistic will be used.

Details

regress is based on lm. All statistics presented in the function's output are derivatives of lm, except AIC value which is obtained from AIC. It uses lm() function to run the model.

Outputs

Outputs can be divided into three parts.

⁠Info of the model⁠: Here provides number of observations (Obs.), F value, p-value from F test, R Squared value, Adjusted R Squared value, square root of mean square error (Root MSE) and AIC value.
Errors: Outputs from anova(model) is tabulated here. SS, DF and MS indicate sum of square of errors, degree of freedom and mean of square of errors.
⁠Regression Output⁠: Coefficients from summary of model are tabulated here along with 95\ confidence interval.

using Robust Standard Errors

if heteroskedasticity is present in our data sample, the ordinary least square (OLS) estimator will remain unbiased and consistent, but not efficient. The estimated OLS standard errors will be biased and cannot be solved with a larger sample size. To remedy this, robust standard errors can be used to adjusted standard errors.

The regress uses sandwich estimator to estimate Huber-White's standard errors. The calculation is based on the tutorial by Kevin Goulding.

Variance of Robust = (N / N - K) (X'X)^(-1) \sum{Xi X'i ei^2} (X'X)^(-1)

where N = number of observations, and K = the number of regressors (including the intercept). This returns a Variance-covariance (VCV) matrix where the diagonal elements are the estimated heteroskedasticity-robust coefficient variances — the ones of interest. Estimated coefficient standard errors are the square root of these diagonal elements.

`predict.regress` generates an original data with statistics for model diagnostics:

fitted (Fitted values)
resid (Residuals)
std.resid (Studentized Residuals)
hat (leverage)
sigma
cooksd (Cook's Distance)

The ⁠Breusch-Pagan test⁠ fits a linear regression model to the residuals of a linear regression model (by default the same explanatory variables are taken as in the main regression model) and rejects if too much of the variance is explained by the additional explanatory variables. Under H_0 the test statistic of the Breusch-Pagan test follows a chi-squared distribution with parameter (the number of regressors without the constant in the model) degrees of freedom.

The code for `linkTest` has been modified from Keith Chamberlain's linktext. www.ChamberlainStatistics.com https://gist.github.com/KeithChamberlain/8d9da515e73a27393effa3c9fe571c3f

Value

a list containing

info - info and error tables
reg - regression table
model - raw model output from lm()
fit - formula for fitting the model
lbl - variable labels for further processing in summary.

Note

Credits to Kevin Goulding, The Tarzan Blog.

Author(s)

Email: dr.myominnoo@gmail.com

Website: https://myominnoo.github.io/

References

T.S. Breusch & A.R. Pagan (1979), A Simple Test for Heteroscedasticity and Random Coefficient Variation. Econometrica 47, 1287–1294

R. Koenker (1981), A Note on Studentizing a Test for Heteroscedasticity. Journal of Econometrics 17, 107–112.

W. Krämer & H. Sonnberger (1986), The Linear Regression Model under Test. Heidelberg: Physics

Examples


fit <- lm(Ozone ~ Wind, data = airquality)
regress(fit)

## Not run: 
## labelling variables
airquality2 <- label(airquality, Ozone = "Ozone level", Wind = "Wind Speed")
fit2 <- lm(Ozone ~ Wind, data = airquality2)
reg <- regress(fit2)
str(reg)

## End(Not run)


## Not run: 
predict(reg)

## End(Not run)


## Not run: 
plot(reg)

## End(Not run)


ladder(airquality, Ozone)


## Not run: 
hettest(reg)

## End(Not run)


## Not run: 
linkTest(fit)

## End(Not run)

[Package mStats version 3.4.0 Index]