beta_gen {lsasim} | R Documentation |
Generate regression coefficients
Description
Uses the output from questionnaire_gen to generate linear regression coefficients.
Usage
beta_gen(
data,
MC = FALSE,
MC_replications = 100,
CI = c(0.005, 0.995),
output_cov = FALSE,
rename_to_q = FALSE,
verbose = TRUE
)
Arguments
data |
output from the |
MC |
if |
MC_replications |
for |
CI |
confidence interval for Monte Carlo simulations |
output_cov |
if |
rename_to_q |
if |
verbose |
if 'FALSE', output messages will be suppressed (useful for simulations). Defaults to 'TRUE' |
Details
This function was primarily conceived as a sub-function of
questionnaire_gen
, when family = "gaussian"
, theta =
TRUE
, and full_output = TRUE
. However, it can also be directly
called by the user so they can perform further analysis.
This function primarily calculates the true regression coefficients
(\beta
) for the linear influence of the background questionnaire
variables in \theta
. From a statistical perspective, this
relationship can be modeled as follows, where E(\theta | \boldsymbol{X}, \boldsymbol{W})
is the expectation of \theta
given \boldsymbol{X} = \{X_1, \ldots, X_P\}
and \boldsymbol{W} = \{W_1, \ldots, W_Q\}
:
E(\theta | \boldsymbol{X}, \boldsymbol{W}) = \beta_0 + \sum_{p = 1}^P \beta_p X_p + \sum_{q = 1}^Q \beta_{P + q} W_q
The regression coefficients are calculated using the true covariance matrix
either provided by the user upon calling of questionnaire_gen
or
randomly generated by that function if none was provided. In any case, that
matrix is not sample-dependent, though it should be similar to the one
observed in the generated data (especially for larger samples). One
convenient way to check for this similarity is by running the function with
MC = TRUE
, which will generate a numeric estimate; the
MC_replications
argument can be then increased to improve the
estimates at a often-noticeable cost in processing time. If MC =
FALSE
, the MC_replications
will have no effect on the results. In
any case, each subsample will always have the same size as the original
sample.
If the background questionnaire contains categorical variables (W
),
the original covariance matrix cannot be used because it contains the
covariances involving Z ~ N(0, 1)
, which is the random variable that
gets categorized into W
. The case where W
is always binomial is
trivial, but if at least one W
has more than two categories, the
structure of the covariance matrix changes drastically. In this case, this
function recalculates all covariances between \theta
, X
and
each category of W
using some auxiliary internal functions which rely
on the appropriate distribution (either multivariate normal or truncated
normal). To avoid multicollinearity, the first categories of each W
are dropped before the regression coefficients are calculated.
Value
By default, this function will output a vector of the regression
coefficients, including intercept. If MC == TRUE
, the output will
instead be a matrix comparing the true regression coefficients obtained
from the covariance matrix with expected values obtained from a Monte Carlo
simulation, complete with 99% confidence interval.
If output_cov = TRUE
, the output will be a list with two elements:
the first one, betas
, will contain the same output described in the
previous paragraph. The second one, called vcov_YXW
, contains
the covariance matrix of the regression coefficients.
Note
The equation in this page is best formatted in PDF. We recommend issuing 'help("beta_gen", help_type = "PDF")' in your terminal and opening the 'beta_gen.pdf' file generated in your working directly. You may also set 'help_type = "HTML"', but the equations will have degraded formatting.
See Also
questionnaire_gen
Examples
data <- questionnaire_gen(100, family="gaussian", theta = TRUE,
full_output = TRUE, n_X = 2, n_W = list(2, 2, 4))
beta_gen(data, MC = TRUE)