rbin {SimCorMultRes}R Documentation

Simulating Correlated Binary Responses Conditional on a Marginal Model Specification

Description

Simulates correlated binary responses assuming a regression model for the marginal probabilities.

Usage

rbin(clsize = clsize, intercepts = intercepts, betas = betas,
  xformula = formula(xdata), xdata = parent.frame(), link = "logit",
  cor.matrix = cor.matrix, rlatent = NULL)

Arguments

clsize

integer indicating the common cluster size.

intercepts

numerical (or numeric vector of length clsize) containing the intercept(s) of the marginal model.

betas

numerical vector or matrix containing the value of the marginal regression parameter vector associated with the covariates (i.e., excluding intercepts).

xformula

formula expression as in other marginal regression models but without including a response variable.

xdata

optional data frame containing the variables provided in xformula.

link

character string indicating the link function in the marginal model. Options include 'probit', 'logit', 'cloglog', 'cauchit' or 'identity'. Required when rlatent = NULL.

cor.matrix

matrix indicating the correlation matrix of the multivariate normal distribution when the NORTA method is employed (rlatent = NULL).

rlatent

matrix with clsize columns containing realizations of the latent random vectors when the NORTA method is not preferred. See details for more info.

Details

The formulae are easier to read from either the Vignette or the Reference Manual (both available here).

The assumed marginal model is

Pr(Yit=1xit)=F(βt0+βtxit)Pr(Y_{it} = 1 |x_{it})=F(\beta_{t0} +\beta^{'}_{t} x_{it})

where FF is the cumulative distribution function determined by link. For subject ii, YitY_{it} is the tt-th binary response and xitx_{it} is the associated covariates vector. Finally, βt0\beta_{t0} and βt\beta_{t} are the intercept and regression parameter vector at the tt-th measurement occasion.

The binary response YitY_{it} is obtained by extending the approach of Emrich and Piedmonte (1991) as suggested in Touloumis (2016).

When βt0=β0\beta_{t0}=\beta_{0} for all tt, then intercepts should be provided as a single number. Otherwise, intercepts must be provided as a numeric vector such that the tt-th element corresponds to the intercept at measurement occasion tt.

betas should be provided as a numeric vector only when βt=β\beta_{t}=\beta for all tt. Otherwise, betas must be provided as a numeric matrix with clsize rows such that the tt-th row contains the value of βt\beta_{t}. In either case, betas should reflect the order of the terms implied by xformula.

The appropriate use of xformula is xformula = ~ covariates, where covariates indicate the linear predictor as in other marginal regression models.

The optional argument xdata should be provided in “long” format.

The NORTA method is the default option for simulating the latent random vectors denoted by eitBe^{B}_{it} in Touloumis (2016). To import simulated values for the latent random vectors without utilizing the NORTA method, the user can employ the rlatent argument. In this case, element (i,ti,t) of rlatent represents the realization of eitBe^{B}_{it}.

Value

Returns a list that has components:

Ysim

the simulated binary responses. Element (ii,tt) represents the realization of YitY_{it}.

simdata

a data frame that includes the simulated response variables (y), the covariates specified by xformula, subjects' identities (id) and the corresponding measurement occasions (time).

rlatent

the latent random variables denoted by eitBe^{B}_{it} in Touloumis (2016).

Author(s)

Anestis Touloumis

References

Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.

Emrich, L. J. and Piedmonte, M. R. (1991) A method for generating high-dimensional multivariate binary variates. The American Statistician 45, 302–304.

Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.

Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.

See Also

rmult.bcl for simulating correlated nominal responses, rmult.clm, rmult.crm and rmult.acl for simulating correlated ordinal responses.

Examples

## See Example 3.5 in the Vignette.
set.seed(123)
sample_size <- 5000
cluster_size <- 4
beta_intercepts <- 0
beta_coefficients <- 0.2
latent_correlation_matrix <- toeplitz(c(1, 0.9, 0.9, 0.9))
x <- rep(rnorm(sample_size), each = cluster_size)
simulated_binary_dataset <- rbin(clsize = cluster_size,
  intercepts = beta_intercepts, betas = beta_coefficients,
  xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit")
library(gee)
binary_gee_model <- gee(y ~ x, family = binomial("probit"), id = id,
  data = simulated_binary_dataset$simdata)
summary(binary_gee_model)$coefficients

## See Example 3.6 in the Vignette.
set.seed(8)
library(evd)
simulated_latent_variables1 <- rmvevd(sample_size, dep = sqrt(1 - 0.9),
  model = "log", d = cluster_size)
  simulated_latent_variables2 <- rmvevd(sample_size, dep = sqrt(1 - 0.9),
  model = "log", d = cluster_size)
simulated_latent_variables <- simulated_latent_variables1 -
  simulated_latent_variables2
simulated_binary_dataset <- rbin(clsize = cluster_size,
  intercepts = beta_intercepts, betas = beta_coefficients,
  xformula = ~x, rlatent = simulated_latent_variables)
binary_gee_model <- gee(y ~ x, family = binomial("logit"), id = id,
  data = simulated_binary_dataset$simdata)
summary(binary_gee_model)$coefficients

[Package SimCorMultRes version 1.9.0 Index]