Linear Regression Model


regress() produces summary of the model with coefficients and 95% Confident Intervals.

`predict.regress` a S3 method for predict to generate statistics related to the prediction of the linear model using the output from the regress function of the mStats.

`plot.regress` is a S3 method for plot() to create graphs for checking diagnostics of linear model using the output from the regress function of the mStats.

`ladder` converts a variable into a normally distributed one.

`hettest` performs the Breusch-Pagan test for heteroskedasticity. It presents evidence against the null hypothesis that t=0 in Var(e)=sigma^2 exp(zt). The formula are based on the bptest function in lmtest package.

`linkTest` determines whether a model in R is 'well specified' using the STATA's linkTest.


regress(model, vce = FALSE, digits = 5)

## S3 method for class 'regress'
predict(object, ...)

## S3 method for class 'regress'
plot(x, ...)

ladder(data, var)

hettest(regress, studentize = FALSE)

linkTest(model, vce = FALSE, digits = 5)



glm or lm model


if TRUE, robust standard errors are calculated.


specify rounding of numbers. See round.


a model object for which prediction is desired.


additional arguments affecting the predictions produced.


the coordinates of points in the plot. Alternatively, a single plotting structure, function or any R object with a plot method can be provided.




variable name


output from regress


logical. If set to TRUE Koenker's studentized version of the test statistic will be used.


regress is based on lm. All statistics presented in the function's output are derivatives of lm, except AIC value which is obtained from AIC. It uses lm() function to run the model.


Outputs can be divided into three parts.

  1. ⁠Info of the model⁠: Here provides number of observations (Obs.), F value, p-value from F test, R Squared value, Adjusted R Squared value, square root of mean square error (Root MSE) and AIC value.

  2. Errors: Outputs from anova(model) is tabulated here. SS, DF and MS indicate sum of square of errors, degree of freedom and mean of square of errors.

  3. ⁠Regression Output⁠: Coefficients from summary of model are tabulated here along with 95\ confidence interval.

using Robust Standard Errors

if heteroskedasticity is present in our data sample, the ordinary least square (OLS) estimator will remain unbiased and consistent, but not efficient. The estimated OLS standard errors will be biased and cannot be solved with a larger sample size. To remedy this, robust standard errors can be used to adjusted standard errors.

The regress uses sandwich estimator to estimate Huber-White's standard errors. The calculation is based on the tutorial by Kevin Goulding.

VarianceofRobust=(N/NK)(XX)(1)XiXiei2(XX)(1)Variance of Robust = (N / N - K) (X'X)^(-1) \sum{Xi X'i ei^2} (X'X)^(-1)

where N = number of observations, and K = the number of regressors (including the intercept). This returns a Variance-covariance (VCV) matrix where the diagonal elements are the estimated heteroskedasticity-robust coefficient variances — the ones of interest. Estimated coefficient standard errors are the square root of these diagonal elements.

`predict.regress` generates an original data with statistics for model diagnostics:

  1. fitted (Fitted values)

  2. resid (Residuals)

  3. std.resid (Studentized Residuals)

  4. hat (leverage)

  5. sigma

  6. cooksd (Cook's Distance)

The ⁠Breusch-Pagan test⁠ fits a linear regression model to the residuals of a linear regression model (by default the same explanatory variables are taken as in the main regression model) and rejects if too much of the variance is explained by the additional explanatory variables. Under H0H_0 the test statistic of the Breusch-Pagan test follows a chi-squared distribution with parameter (the number of regressors without the constant in the model) degrees of freedom.

The code for `linkTest` has been modified from Keith Chamberlain's linktext.


a list containing

  1. info - info and error tables

  2. reg - regression table

  3. model - raw model output from lm()

  4. fit - formula for fitting the model

  5. lbl - variable labels for further processing in summary.


Credits to Kevin Goulding, The Tarzan Blog.





T.S. Breusch & A.R. Pagan (1979), A Simple Test for Heteroscedasticity and Random Coefficient Variation. Econometrica 47, 1287–1294

R. Koenker (1981), A Note on Studentizing a Test for Heteroscedasticity. Journal of Econometrics 17, 107–112.

W. Krämer & H. Sonnberger (1986), The Linear Regression Model under Test. Heidelberg: Physics


fit <- lm(Ozone ~ Wind, data = airquality)

## Not run: 
## labelling variables
airquality2 <- label(airquality, Ozone = "Ozone level", Wind = "Wind Speed")
fit2 <- lm(Ozone ~ Wind, data = airquality2)
reg <- regress(fit2)

## End(Not run)

## Not run: 

## End(Not run)

## Not run: 

## End(Not run)

ladder(airquality, Ozone)

## Not run: 

## End(Not run)

## Not run: 

## End(Not run)

