R: Simulate data for Linear Models with known Parameter values...

MLtrue {RXshrink}

R Documentation

Simulate data for Linear Models with known Parameter values and Normal Errors

Description

Using specified numerical values for Parameters, such as the true error-term variance, that usually are unknown, MLtrue() creates a new data.frame that contains variables "Yhat" (the vector of expected values) and "Yvec" (Yhat + IID Normal error-terms) as well as the p given X-variables. Thus, MLtrue() produces "correct" linear models that can be ideal for analysis and bootstraping via methods based on Normal-theory.

Usage

  MLtrue(form, data, seed, go=TRUE, truv, trub, truc)

Arguments

`form`	A regression formula [y~x1+x2+...] suitable for use with lm().
`data`	Data frame containing observations on all variables in the formula.
`seed`	Seed for random number generation between 1 and 1000. When this seed number is missing in a call to MLtrue(), a new seed is generated using runif(). Error terms are then generated using calls to rnorm(), and the seed used is reported in the MLtrue output.list to enable reproducible research.
`go`	Logical value: go=TRUE starts the simulation using OLS estimates or the numerical values from the (optional) truv, trub and truc arguments to define the "Yhat" and "Yvec" variables; go=FALSE causes MLtrue() to compute OLS estimates that ignore the truv, trueb and truc arguments. Thus, go=FALSE provides a convenient way for users to Print and Examine the MLtrue() output.list before deciding which parameter-settings they wish to actually USE or MODIFY before making a final run with go=TRUE.
`truv`	Optional: numerical value for the true error variance, sigma^2.
`trub`	Optional: column vector of numerical values for the true regression coefficients.
`truc`	Optional: column vector of numerical values for the true uncorrelated components of the regression coefficient vector.

Details

RXshrink functions like eff.ridge() and qm.ridge() calculate maximum likelihood estimates ...either unbiased (BLUE) or "optimally" biased (minimum MSE risk) along a specified "path"... for typical statistical inference situations where true regression parameters are unknown. Furthermore the specified linear-model may be "incorrect" or the error-terms may not be IID Normal variates. In sharp contrast with this usual situation, the MLtrue() function generates a "Yvec" vector of outcomes from a CORRECT model of the given form that does contain IID Normal error-terms and all true parameter values are KNOWN. This makes it interesting to compare plots and output.lists from RXshrink GRR-estimation functions on a "real" data.frame with the corresponding outputs from a MLtrue() data.frame with known parameter-setting that are either identical to or "close" to those estimated from "real" data. WARNING: All output X-variables are "centered" and "rescaled" to have mean ~0 and variance 1. Yhat expected values are "centered" but usually have variance differing from 1. Yvec values are not "centered" and have variance determined by either the original data.frame or the "truv" setting.

Value

An output list object of class MLtrue:

`new`	A data.frame containing the Yvec and Yhat vectors as well as centered and rescaled versions of the input X-matrix and original Y-vector.
`Yvec`	An additional copy of Yvec pseudo-data containing Normal error-terms.
`Yhat`	An additional copy of Yhat linear fitted-values.
`seed`	Seed actually used in random-number generation; needed for replication.
`tvar`	True numerical value of sigma^2 ...[1,1] matrix.
`tbeta`	True numerical values of beta-coefficients ...[p,1] matrix.
`tcomp`	True numerical values of uncorrelated components ...[p,1] matrix.
`useb`	Logical: TRUE => tbeta vector was used in simulation. FALSE => tcomp vector was used rather than tbeta.
`data`	The name of the data.frame object specified in the second argument.
`form`	The regression formula specified in the first argument. NOTE: The final 11 output items [p, ..., xs] are identical to those in output lists from RXshrink functions like eff.ridge() and qm.ridge().

Author(s)

Bob Obenchain <wizbob@att.net>

Examples

  # require(RXshrink)
  data(mpg)
  form <- mpg~cylnds+cubins+hpower+weight
  MLtest <- MLtrue(form, mpg, go=FALSE)   # all other potential arguments "missing"...
  MLtest   # print current parameter estimates...
  cvec <- c(-0.5, 0.15, -0.16, -0.6)   # define alternative "true" components...
  MLout <- MLtrue(form, mpg, truc = cvec ) # Use "truc" input with default go=TRUE...
  str(MLout)
  formY <- Yhat~cylnds+cubins+hpower+weight   # Formula for true expected Y-outcomes...
  lmobj <- lm(formY, MLout$new)
  max(abs(lmobj$residuals))   # essentially 0 because linear model is "correct"...
  # effobj <- eff.ridge(formY, MLout$new)   ...generates "Error" because RSQUARE=1.

[Package RXshrink version 2.3 Index]