sim_Y_Binary_X {sim2Dpredictr}R Documentation

Simulate Scalar Outcomes from Simulated Spatially Dependent Binary Predictors

Description

N spatially dependent binary design vectors are simulated using sim2D_binarymap. These design vectors are used to then simulate scalar outcomes that have one of Gaussian, Binomial, or Poisson distributions.

Usage

sim_Y_Binary_X(
  N,
  B,
  rand.err = 1,
  dist,
  incl.subjectID = TRUE,
  binomial.method = "traditional",
  count.method = "traditional",
  Y.thresh = NULL,
  print.out = FALSE,
  xlim = c(0, 1),
  ylim = c(0, 1),
  im.res,
  radius.bounds = c(0.02, 0.1),
  lambda = 50,
  random.lambda = FALSE,
  lambda.sd = 10,
  lambda.bound = NULL,
  prior = "gamma",
  sub.area = FALSE,
  min.sa = c(0.1, 0.1),
  max.sa = c(0.3, 0.3),
  radius.bounds.min.sa = c(0.02, 0.05),
  radius.bounds.max.sa = c(0.08, 0.15),
  print.subj.sa = FALSE,
  print.lambda = FALSE,
  print.iter = FALSE
)

Arguments

N

A scalar value determining the number of images to create.

B

A vector parameter values; i.e. "betas". Note that length(B) must equal p + 1 = n.row * n.col + 1; e.g. for normal outcomes Y = XB + e with Y a scalar outcome and e the random error.

rand.err

A scalar for the random error variance when dist = "gaussian".

dist

The distribution of the scalar outcome.

  • dist = "gaussian" has Y = XB + e, where e ~ N(0, rand.err).

  • dist = "binomial" is drawn from eqnBin(XB, XB(1-XB)) using rbinom() when binary.method = "Traditional". If binary.method = "Gaussian", then simulation is based on a cutoff using binary.cutoff.

  • dist = "poisson" is drawn from Poisson(XB) using rpois().

incl.subjectID

When incl.subjectID = TRUE a column of subject indices is generated. Y.thresh = NULL (default). If binomial.method = "gaussian manual", then Y.thresh should be any scalar real number; values equal or above this cutoff are assigned 1 and values below are assigned 0. If binomial.method = "gaussian percentile", then values equal or above this percentile are assigned 1, and other wise 0; in this case values should be between 0 and 1. For example, if Y.thresh = 0.9, then the cutoff is the 90th percentile.

binomial.method

One of c("traditional", "gaussian manual", "gaussian percentile"). Only specified when dist = "binomial", and determines whether draws are directly taken from a binomial distribution or if draws are taken from a Multivariate Normal Distribution (in the manner of dist = "gaussian") and thresholds imposed to binarize the outcomes. binomial.method = "gaussian manual" allows the user to specify specific values for categorizing outcomes. binomial.method = "gaussian percentile" allows the user to specify percentiles for binarizing the data. Both approaches use Y.thresh to specify the cutoff value(s). If binomial.method = "gaussian percentile" and Y.thresh = NULL then the median is used as the threshold. If binomial.method = "gaussian manual" and Y.thresh = NULL, then 0 is used as the threshold. Default is binomial.method = "traditional".

count.method

One of c("traditional", "rounding"). When count.method = "traditional", the outcomes are drawn sequentially using rpois(). When count.method = "traditional", the outcomes are drawn from an MVN, then values less than or equal to 0 are set to 0, and all other values are rounded to the nearest whole number.

Y.thresh

When binomial.method = "traditional"

print.out

If print.out = TRUE then print the following for each subject, indexed y:

  • X[y] %*% B

  • p[y], lambda[y] for Binomial, Poisson, respectively.

This is useful to see the effect of image parameter selection and beta parameter selection on distributional parameters for the outcome of interest.

xlim, ylim

These are the 2D image limits. Defaults for both are c(0, 1). It is not recommended to alter these arguments unless changing the limits has a specific practical utility.

im.res

A vector specifying the dimension/resolution of the image. The first entry is the number of 'rows' in the lattice/image, and the second entry is the number of columns' in the lattice/image.

radius.bounds

A 2-element vector whose first and second entries determine the minimum and maximum radius sizes, respectively; these will be the bounds of the uniform distribution used to draw the radii. If sub.area = TRUE, then use radius.bounds.min.sa and radius.bounds.max.sa.

lambda

A scalar value specifying the mean/intensity value of the Poisson process. If random.lambda = FALSE then this is the parameter used to generate the binary image for each subject. If random.lambda = TRUE, then this is the mean parameter in the distribution used to draw subject-specific lambda.

random.lambda

random.lambda = TRUE allows the lambda (mean/intensity) parameter in the Poisson process to vary randomly by subject.

lambda.sd

Only utilized when random.lambda = TRUE, and specifies the standard deviation in the distribution used to draw subject-specific lambda.

lambda.bound

Only utilized when random.lambda = TRUE, and allows the user to specify a lower and upper bound for the subject-specific lambda; if the randomly selected value is outside of this range, then another draw is taken. This continues until a value is selected within the specified bounds. If no bounds are desired then specify lambda.bound = NULL.

prior

Only utilized when random.lambda = TRUE, and specifies the distribution from which to draw the subject-specific lambda. Options are c("gaussian", "gamma").

sub.area

When sub.area = TRUE, a random sub-section of the image is chosen, within which the Poisson process is used to generate the binary image.

min.sa, max.sa

Only utilized when sub.area = TRUE, and determines the width and height of the minimum and maximum sub-areas; e.g., if min.sa = c(0.1, 0.1), then the smallest possible random sub-area is a 0.1 x 0.1 square.

radius.bounds.min.sa, radius.bounds.max.sa

Only utilized when sub.area = TRUE, and specifies radius.bounds for the minimum and maximum sub-areas, respectively. This information is used to adaptively alter the bounds in between the minimum and maximum sub-areas.

print.subj.sa, print.lambda, print.iter

These arguments are either TRUE or FALSE, and define print options for checking that the function is working as the user intends. print.subj.sa = TRUE prints the x-and y-limits for each subject's sub-area. print.lambda = TRUE prints each subject's mean and realized events; the means will be the same unless random.lambda = TRUE, but the number of realized events will always vary. print.iter = TRUE is only used when random.lambda = TRUE and is.null(lambda.bound) = FALSE, and shows iterations for re-drawing when the randomly selected intensity is outside the specified bounds.

Value

A data frame where each row consists of a single subject's data. Col 1 is the outcome, Y, and each successive column contains the subject predictor values.

Note

Careful parameter selection, i.e. B, is necessary to ensure that simulated outcomes are reasonable; in particular, counts arising from the Poisson distribution can be unnaturally large.

References

Cressie N, Wikle CK (2011). Statistics for Spatio-Temporal Data, Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, NJ.

Ripley BD (1987). Stochastic Simulation. John Wiley & Sons. doi:10.1002/9780470316726.

Examples


## Define non-zero beta values
Bex <- beta_builder(row.index = c(3, 3, 4), 
                    col.index = c(3, 4, 3),
                    im.res = c(5, 5),
                    B0 = 0, B.values = rep(1/3, 3),
                    output.indices = FALSE)
## Simulate Datasets
## parameter values
Nex = 10
set.seed(28743)

Gauss.ex <- sim_Y_Binary_X(N = Nex, 
                           B = Bex,
                           dist = "gaussian", 
                           im.res = c(5, 5))
hist(Gauss.ex$Y)

## direct draws from binomial
Bin.ex <- sim_Y_Binary_X(N = Nex, 
                         B = Bex, 
                         im.res = c(5, 5),
                         dist = "binomial", 
                         print.out = TRUE)
table(Bin.ex$Y)

[Package sim2Dpredictr version 0.1.1 Index]