rbin {SimCorMultRes} | R Documentation |
Simulating Correlated Binary Responses Conditional on a Marginal Model Specification
Description
Simulates correlated binary responses assuming a regression model for the marginal probabilities.
Usage
rbin(clsize = clsize, intercepts = intercepts, betas = betas,
xformula = formula(xdata), xdata = parent.frame(), link = "logit",
cor.matrix = cor.matrix, rlatent = NULL)
Arguments
clsize |
integer indicating the common cluster size. |
intercepts |
numerical (or numeric vector of length |
betas |
numerical vector or matrix containing the value of the marginal
regression parameter vector associated with the covariates (i.e., excluding
|
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
link |
character string indicating the link function in the marginal
model. Options include |
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
Details
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal model is
Pr(Y_{it} = 1 |x_{it})=F(\beta_{t0}
+\beta^{'}_{t} x_{it})
where F
is the cumulative distribution
function determined by link
. For subject i
, Y_{it}
is the
t
-th binary response and x_{it}
is the associated covariates
vector. Finally, \beta_{t0}
and \beta_{t}
are the intercept and
regression parameter vector at the t
-th measurement occasion.
The binary response Y_{it}
is obtained by extending the approach of
Emrich and Piedmonte (1991) as suggested in Touloumis (2016).
When \beta_{t0}=\beta_{0}
for all t
, then intercepts
should be provided as a single number. Otherwise, intercepts
must be
provided as a numeric vector such that the t
-th element corresponds to
the intercept at measurement occasion t
.
betas
should be provided as a numeric vector only when
\beta_{t}=\beta
for all t
. Otherwise, betas
must be
provided as a numeric matrix with clsize
rows such that the
t
-th row contains the value of \beta_{t}
. In either case,
betas
should reflect the order of the terms implied by
xformula
.
The appropriate use of xformula
is xformula = ~ covariates
,
where covariates
indicate the linear predictor as in other marginal
regression models.
The optional argument xdata
should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by e^{B}_{it}
in Touloumis (2016). To import
simulated values for the latent random vectors without utilizing the NORTA
method, the user can employ the rlatent
argument. In this case,
element (i,t
) of rlatent
represents the realization of
e^{B}_{it}
.
Value
Returns a list that has components:
Ysim |
the simulated binary
responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Author(s)
Anestis Touloumis
References
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Emrich, L. J. and Piedmonte, M. R. (1991) A method for generating high-dimensional multivariate binary variates. The American Statistician 45, 302–304.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
See Also
rmult.bcl
for simulating correlated nominal
responses, rmult.clm
, rmult.crm
and
rmult.acl
for simulating correlated ordinal responses.
Examples
## See Example 3.5 in the Vignette.
set.seed(123)
sample_size <- 5000
cluster_size <- 4
beta_intercepts <- 0
beta_coefficients <- 0.2
latent_correlation_matrix <- toeplitz(c(1, 0.9, 0.9, 0.9))
x <- rep(rnorm(sample_size), each = cluster_size)
simulated_binary_dataset <- rbin(clsize = cluster_size,
intercepts = beta_intercepts, betas = beta_coefficients,
xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit")
library(gee)
binary_gee_model <- gee(y ~ x, family = binomial("probit"), id = id,
data = simulated_binary_dataset$simdata)
summary(binary_gee_model)$coefficients
## See Example 3.6 in the Vignette.
set.seed(8)
library(evd)
simulated_latent_variables1 <- rmvevd(sample_size, dep = sqrt(1 - 0.9),
model = "log", d = cluster_size)
simulated_latent_variables2 <- rmvevd(sample_size, dep = sqrt(1 - 0.9),
model = "log", d = cluster_size)
simulated_latent_variables <- simulated_latent_variables1 -
simulated_latent_variables2
simulated_binary_dataset <- rbin(clsize = cluster_size,
intercepts = beta_intercepts, betas = beta_coefficients,
xformula = ~x, rlatent = simulated_latent_variables)
binary_gee_model <- gee(y ~ x, family = binomial("logit"), id = id,
data = simulated_binary_dataset$simdata)
summary(binary_gee_model)$coefficients