R: Generates OLS Data and Confidence/Prediction Intervals for...

repeat.sample {desk}

R Documentation

Generates OLS Data and Confidence/Prediction Intervals for Repeated Samples

Description

This command simulates repeated samples given fixed data of the exogenous predictors and given (true) regression parameters. For each sample generated the results from an OLS regression with level parameter and confidence intervals (CIs) as well as prediction intervals are calculated.

Usage

repeat.sample(
  x,
  true.par,
  omit = 0,
  mean = 0,
  sd = 1,
  rep = 100,
  xnew = x,
  sig.level = 0.05,
  seed = NULL
)

Arguments

`x`	(n x k) vector or matrix of exogenous data, where each column represents the data of one of k exogenous predictors. The number of rows represents the sample size n.
`true.par`	vector of true parameters in the linear model (level and slope parameters). If `true.par` is a vector without named elements then coefficients are named "alpha", "beta1", "beta2", ..., "betak" by default. Otherwise the names specified are used.
`omit`	vector of indices identifying the exogenous variables to be omitted in the true model, e.g. `omit = 1` corresponds to the first exogenous variable to be omitted. This argument can be used to illustrate omitted variable bias in parameter and standard error estimates. Default value is `omit = 0`, i.e. no exogenous variable is omitted
`mean`	expected value of the normal distribution of the error term.
`sd`	standard deviation of the normal distribution of the error term. Used only for generating simulated y-values. Interval estimators use the estimated sigma.
`rep`	repetitions, i.e. number of simulated samples. The samples in each matrix generated have enumerated names "SMPL1", "SMPL2", ..., "SMPLs".
`xnew`	(t x k) matrix of new exogenous data points at which prediction intervals should be calculated. t corresponds to the number of new data points, k to the number of exogenous variables in the model. If not specified regular values `x` are used (see first argument).
`sig.level`	significance level for confidence and prediction intervals.
`seed`	optionally set random seed to arbitrary number if results should be made replicable.

Details

Let X be an object generated by repeat.sample() then plot(X, ...) accepts the following arguments:

`plot.what = "confint"`	plot stacked confidence intervals for all samples. Additional arguments are `center = TRUE` (plot center of intervals?), `which.coef = 2` (intervals for which coefficient?), `center.size = 1` (size of the center dot), `lwd = 1` (line width).
`plot.what = "reglines"`	plot regression lines of all samples.
`plot.what = "scatter"`	plot scatter plots of all samples.

Value

A list of named data structures. Let s = number of samples, n = sample size, k = number of coefficients, t = number of new data points in xnew then:

`x`	(n x k matrix): copy of data of exogenous regressors that was passed to the function.
`y`	(n x s matrix): simulated real y values in each sample.
`fitted`	(n x s matrix): estimated y values in each sample.
`coef`	(k x s matrix): estimated parameters in each sample.
`true.par`	(k vector): vector of true parameter values (implemented only for `plot.confint()`).
`u`	(n x s matrix): random error term in each sample.
`residuals`	(n x s matrix): residuals of OLS estimations in each sample.
`sig.squ`	(s vector): estimated variance of the error term in each sample.
`var.u`	(s vector): variance of random errors drawn in each sample.
`se`	(k x s matrix): estimated standard deviation of the coefficients in each sample.
`vcov.coef`	(k x k x s array): estimated variance-covariance matrix of the coefficients in each sample.
`confint`	(k x 2 x s array): confidence intervals of the coefficients in each sample. Interval bounds are named "lower" and "upper".
`outside.ci`	(k vector): percentage of confidence intervals not covering the true value for each of the regression parameters.
`y0`	(t x s matrix): simulated real future y values at `xnew` in each sample (real line plus real error).
`y0.fitted`	(t x s matrix): point prediction, i.e. estimated y values at `xnew` in each sample (regression line).
`predint`	(t x 2 x s array): prediction intervals of future endogenous realizations at exogenous data points specified by `xnew`. Intervals are calculated for each sample, respectively. Interval bounds are named "lower" and "upper".
`sd.pe`	(t x s matrix): estimated standard deviation of prediction errors at all exogenous data points in each sample.
`outside.pi`	(t vector): percentage of prediction intervals not covering the true value `y0` at `xnew`.
`bias.coef`	(k vector): true bias in parameter estimators if variables are omitted (argument `omit` unequal to zero).

Examples

## Generate data of two predictors
x1 = c(1,2,3,4,5)
x2 = c(2,4,5,5,6)
x = cbind(x1,x2)

## Generate list of data structures and name it "out"
out = repeat.sample(x, true.par = c(2,1,4), rep = 10)

## Extract some data
out$coef[2,8] # Extract estimated beta1 (i.e. 2nd coef) in the 8th sample
out$coef["beta1","SMPL8"] # Same as above using internal names
out$confint["beta1","upper","SMPL5"] # Extract only upper bound of CI of beta 1 from 5th sample
out$confint[,,5] # Extract CIs (upper and lower bound) for all parameters from 5th sample
out$confint[,,"SMPL5"] # Same as above using internal names
out$confint["beta1",,"SMPL5"] # Extract CI of beta 1 from 5th sample
out$u.hat[,"SMPL7"] # Extract residuals from OLS estimation of sample 7

## Generate prediction intervals at three specified points of exogenous data (xnew)
out = repeat.sample(x, true.par = c(2,1,4), rep = 10,
      xnew = cbind(x1 = c(1.5,6,7), x2 = c(1,3,5.5)))
out$predint[,,6] # Prediction intervals at the three data points of xnew in 6th sample
out$sd.pe[,6] # Estimated standard deviations of prediction errors in 6th sample
out$outside.pi # Percentage of how many intervals miss true y0 realization

## Illustrate that the relative shares of cases when the interval does not cover the
## true value approaches the significance level
out = repeat.sample(x, true.par = c(2,1,4), rep = 1000)
out$outside.ci

## Illustrate omitted variable bias
out.unbiased = repeat.sample(x, true.par = c(2,1,4))
mean(out.unbiased$coef["beta1",]) # approx. equal to beta1 = 1
out.biased = repeat.sample(x, true.par = c(2,1,4), omit = 2) # omit x2
mean(out.biased$coef["beta1",]) # not approx. equal to beta1 = 1
out.biased$bias.coef # show the true bias in coefficients

## Simulate a regression with given correlation structure in exogenous data
corr.mat = cbind(c(1, 0.9),c(0.9, 1)) # Generate desired corr. structure (high autocorrelation)
X = makedata.corr(n = 10, k = 2, CORR = corr.mat) # Generate 10 obs. of 2 exogenous variables
out = repeat.sample(X, true.par = c(2,1,4), rep = 1) # Simulate a regression
out$vcov.coef

## Illustrate confidence intervals
out = repeat.sample(c(10, 20, 30,50), true.par = c(0.2,0.13), rep = 10, seed = 12)
plot(out, plot.what = "confint")

## Plots confidence intervals of alpha with specified \code{xlim} values.
plot(out, plot.what = "confint", which.coef = 1, xlim = c(-15,15))

## Illustrate normality of dependent variable
out = repeat.sample(c(10,30,50), true.par = c(0.2,0.13), rep = 200)
plot(out, plot.what = "scatter")

## Illustrate confidence bands in a regression
plot(out, plot.what = "reglines")

[Package desk version 1.1.1 Index]