R: Generate regression coefficients

beta_gen {lsasim}

R Documentation

Generate regression coefficients

Description

Uses the output from questionnaire_gen to generate linear regression coefficients.

Usage

beta_gen(
  data,
  MC = FALSE,
  MC_replications = 100,
  CI = c(0.005, 0.995),
  output_cov = FALSE,
  rename_to_q = FALSE,
  verbose = TRUE
)

Arguments

`data`	output from the `questionnaire_gen` function with `full_output = TRUE` and `theta = TRUE`
`MC`	if `TRUE`, performs Monte Carlo simulation to estimate regression coefficients
`MC_replications`	for `MC = TRUE`, this represents the number of Monte Carlo subsamples calculated
`CI`	confidence interval for Monte Carlo simulations
`output_cov`	if `TRUE`, will also output the covariance matrix of YXW
`rename_to_q`	if `TRUE`, renames the variables from "x" and "w" to "q"
`verbose`	if 'FALSE', output messages will be suppressed (useful for simulations). Defaults to 'TRUE'

Details

This function was primarily conceived as a sub-function of questionnaire_gen, when family = "gaussian", theta = TRUE, and full_output = TRUE. However, it can also be directly called by the user so they can perform further analysis.

This function primarily calculates the true regression coefficients (\beta) for the linear influence of the background questionnaire variables in \theta. From a statistical perspective, this relationship can be modeled as follows, where E(\theta | \boldsymbol{X}, \boldsymbol{W}) is the expectation of \theta given \boldsymbol{X} = \{X_1, \ldots, X_P\} and \boldsymbol{W} = \{W_1, \ldots, W_Q\}:

E(\theta | \boldsymbol{X}, \boldsymbol{W}) = \beta_0 + \sum_{p = 1}^P \beta_p X_p + \sum_{q = 1}^Q \beta_{P + q} W_q

The regression coefficients are calculated using the true covariance matrix either provided by the user upon calling of questionnaire_gen or randomly generated by that function if none was provided. In any case, that matrix is not sample-dependent, though it should be similar to the one observed in the generated data (especially for larger samples). One convenient way to check for this similarity is by running the function with MC = TRUE, which will generate a numeric estimate; the MC_replications argument can be then increased to improve the estimates at a often-noticeable cost in processing time. If MC = FALSE, the MC_replications will have no effect on the results. In any case, each subsample will always have the same size as the original sample.

If the background questionnaire contains categorical variables (W), the original covariance matrix cannot be used because it contains the covariances involving Z ~ N(0, 1), which is the random variable that gets categorized into W. The case where W is always binomial is trivial, but if at least one W has more than two categories, the structure of the covariance matrix changes drastically. In this case, this function recalculates all covariances between \theta, X and each category of W using some auxiliary internal functions which rely on the appropriate distribution (either multivariate normal or truncated normal). To avoid multicollinearity, the first categories of each W are dropped before the regression coefficients are calculated.

Value

By default, this function will output a vector of the regression coefficients, including intercept. If MC == TRUE, the output will instead be a matrix comparing the true regression coefficients obtained from the covariance matrix with expected values obtained from a Monte Carlo simulation, complete with 99% confidence interval.

If output_cov = TRUE, the output will be a list with two elements: the first one, betas, will contain the same output described in the previous paragraph. The second one, called vcov_YXW, contains the covariance matrix of the regression coefficients.

Note

The equation in this page is best formatted in PDF. We recommend issuing 'help("beta_gen", help_type = "PDF")' in your terminal and opening the 'beta_gen.pdf' file generated in your working directly. You may also set 'help_type = "HTML"', but the equations will have degraded formatting.

Examples


data <- questionnaire_gen(100, family="gaussian", theta = TRUE,
                           full_output = TRUE, n_X = 2, n_W = list(2, 2, 4))
beta_gen(data, MC = TRUE)

[Package lsasim version 2.1.5 Index]