R: Simulate Scalar Outcomes from Simulated Spatially Correlated...

sim_Y_MVN_X {sim2Dpredictr}

R Documentation

Simulate Scalar Outcomes from Simulated Spatially Correlated Predictors

Description

N spatially correlated design vectors are simulated from an MVN. These design vectors are used to then simulate scalar outcomes that have one of Gaussian, Binomial, Multinomial or Poisson distributions.

Usage

sim_Y_MVN_X(
  N,
  B,
  L = NULL,
  R = NULL,
  S = NULL,
  Q = NULL,
  use.spam = TRUE,
  mu = 0,
  rand.err = 1,
  dist = "gaussian",
  V = NULL,
  incl.subjectID = TRUE,
  threshold.method = "none",
  Y.thresh = NULL,
  X.categorical = FALSE,
  X.num.categories = 2,
  X.category.type = "percentile",
  X.manual.thresh = NULL,
  X.cat.names = NULL,
  print.out = FALSE
)

Arguments

`N`	The number of draws to take from MVN; i.e., the number of subjects.
`B`	A vector parameter values; i.e. "betas". Note that `length(B)` must equal `p + 1 = n.row * n.col + 1`; e.g. for normal outcomes `Y = XB + e` with `Y` a scalar outcome and `e` the random error. Note that when `dist = "multinomial"` then `B` should be a list with length equal to `V - 1`, i.e., should contain parameter values associated with all categories except the reference category. Alternatively, when `dist = "multinomial"` `B` may be a list of length `V` if one desires to specify parameters for every category, i.e., the over-parameterized model used in Friedman (2010).
`L`, `R`	`L` and `R` are lower and upper triangular matrices, respectively, and are the Cholesky factor(s) of the desired covariance matrix for the MVN. Obtain `L` or `R` via `chol_s2Dp()` with settings `triangle = "lower"` or `triangle = "upper"`, respectively. Specify either `L` or `R`, but NOT both.
`S`, `Q`	A covariance or precision matrix respectively. These are for use with `spam`, and can be extracted from output of `chol_s2Dp` after choosing `return.cov = TRUE` or `return.prec = TRUE`, respectively.
`use.spam`	Logical. If `use.spam = TRUE` then use tools from the R package `spam`; otherwise, base R functions are employed. For large dimension MVN with sparse correlation structure, `spam` is recommended; otherwise, base R may be faster. Defaults to `FALSE`. Requires either the covariance matrix `S` or precision matrix, `Q`, that corresponds to the Cholesky factor.
`mu`	One of the following: A single scalar value for common mean. A vector of length `nrow(R)` (equivalently `nrow(R)`) containing means for the MVN.
`rand.err`	A vector for the random error standard deviation when `dist = "gaussian"`, or thresholding is used to obtain non-Normal draws. Must have length 1 or length N.
`dist`	The distribution of the scalar outcome. `dist = "gaussian"` has `Y = XB + e`, where `e ~ N(0, rand.err)`. `dist = "binomial"`: Y is drawn from a binomial distribution with probability of "success" equal to `1 / (1 + 1 / exp(XB))` using `rbinom()` when `binary.method = "traditional"`. If `binary.method = "gaussian"`, then simulation is based on a cutoff using `binary.cutoff`. `dist = "multinomial"`: Y is drawn from `sample()` using probabilities generated based on Chapter 6.1.3 of Agresti (2007) when `length(B) = V - 1` or Friedman (2010) when the `length(B) = V`. Threshold-based approaches are not currently supported. `dist = "poisson"`: Y is drawn from `Poisson(exp(XB))` using `rpois()`.
`V`	A numeric value stating the number of categories desired when `dist = "multinomial"`.
`incl.subjectID`	When `incl.subjectID = TRUE` a column of subject indices is generated.
`threshold.method`	One of `"none", "manual", "percentile", "round"`. When `"none"` draws from Binomial or Poisson distributions are taken subject-wise using base `R` functions. For the remaining options, draws are first taken from a Normal distribution and then thresholded. `"manual"` uses `Y.thresh` to manually select a cutoff, `"percentile"` uses `Y.thresh` to select percentiles used to bin outcomes, and `"round"` sets values equal or less than 0 to 0, and rounds all positive values to the nearest whole number.
`Y.thresh`	A manual value used to threshold when `threshold.method = "manual"`; values equal or greater than the cutoff are assigned 1 and all others 0. When `threshold.method = "percentile"`, a percentile to use to bin outcomes.
`X.categorical`	Default is `X.categorical = FALSE`. If `X.categorical = TRUE` then thresholds are applied to categorize each predictor/image value.
`X.num.categories`	A scalar value denoting the number of categories in which to divide the data.
`X.category.type`	Tells R how to categorize the data. Options are `X.category.type = c("percentile", "manual")`. If `X.category.type = "percentile"` then the data are divided into percentiles based on `X.num.categories`; e.g. if `X.num.categories = 4` then the values are divided into quartiles, and values in Q1 equal 0, between Q1 and Q2 equal 1, between Q2 and Q3 equal 2, and greater than Q3 equal 3. If `X.category.type = "manual"` then specify the cutoff points with `X.manual.thresh`.
`X.manual.thresh`	A vector containing the thresholds for categorizing the values; e.g. if `X.num.categories = 4` and `X.manual.thresh = c(-3, 1, 17)`, then values less than -3 are set to 0, equal or greater than -3 and less than 1 are set to 1, equal or greater than 1 but less than 17 are set to 2, and equal or greater than 17 are set to 3. Note that `length(X.manual.thresh)` must always equal `X.num.categories - 1`.
`X.cat.names`	A vector of category names. If `X.cat.names = NULL` then the initial integers assigned are left as the values; the names in `X.cat.names` are assigned in ascending order.
`print.out`	If `print.out = TRUE` then print the following for each subject, indexed y: `X[y] %*% B` `p[y]`, `lambda[y]` for Binomial, Poisson, respectively. This is useful to see the effect of image parameter selection and beta parameter selection on distributional parameters for the outcome of interest.

Value

A data frame where each row consists of a single subject's data. Col 1 is the outcome, Y, and each successive column contains the subject predictor values.

Note

Careful parameter selection, i.e. B, is necessary to ensure that simulated outcomes are reasonable; in particular, counts arising from the Poisson distribution can be unnaturally large.

References

Furrer R, Sain SR (2010). “spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields.” Journal of Statistical Software, 36(10), 1-25. https://www.jstatsoft.org/v36/i10/.

Ripley BD (1987). Stochastic Simulation. John Wiley & Sons. doi:10.1002/9780470316726.

Rue H (2001). “Fast Sampling of Gaussian Markov Random Fields.” Journal of the Royal Statistical Society B, 63, 325-338. doi:10.1111/1467-9868.00288.

Agresti A (2007). An Introduction to Categorical Analysis, 2nd edition. John Wiley & Sons, Hoboken, New Jersey.

Friedman J, Hastie T, Tibshirani R (2010). “Regularization paths for generalized linear models via coordinate descent.” Journal of Statistical Software, 33, 1-22. doi:10.18637/jss.v033.i01.

Examples

## generate precision matrix and take Cholesky decomposition
Rpre <- chol_s2Dp(im.res = c(3, 3), matrix.type = "prec",
                  use.spam = TRUE, neighborhood = "ar1",
                  triangle = "upper", return.prec = TRUE)
## Generate correlation matrix & take Cholesky decomposition
Rcov <- chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), 
                  rho = 0.5,
                  triangle = "upper",
                  use.spam = FALSE, neighborhood = "none")

## Define non-zero beta values
Bex <- beta_builder(row.index = c(2, 3), 
                    col.index = c(3, 3),
                    im.res = c(3, 3),
                    B0 = 0, B.values = rep(1, 2),
                    output.indices = FALSE)
## Simulate Datasets
## parameter values
Nex = 100
set.seed(28743)

## with precision matrix
Gauss.exp <- sim_Y_MVN_X(N = Nex, B = Bex,
                         R = Rpre$R, Q = Rpre$Q,
                         dist = "gaussian")
hist(Gauss.exp$Y)

## with covariance matrix
Gauss.exc <- sim_Y_MVN_X(N = Nex, B = Bex,
                         R = Rcov$R, S = Rcov$S,
                         dist = "gaussian")
hist(Gauss.exc$Y)

## direct draws from binomial
Bin.ex <- sim_Y_MVN_X(N = Nex, B = Bex, R = Rcov$R, S = Rcov$S,
                      dist = "binomial", print.out = TRUE)
table(Bin.ex$Y)

## manual cutoff
Bin.ex2 <- sim_Y_MVN_X(N = Nex, B = Bex,
                       R = Rcov$R, S = Rcov$S,
                       dist = "binomial",
                       threshold.method = "manual",
                       Y.thresh = 1.25)
table(Bin.ex2$Y)

## percentile cutoff
Bin.ex3 <- sim_Y_MVN_X(N = Nex, B = Bex,
                       R = Rcov$R, S = Rcov$S,
                       dist = "binomial",
                       threshold.method = "percentile",
                       Y.thresh = 0.75)
table(Bin.ex3$Y)

## Poisson Example - note the large counts
Pois.ex <- sim_Y_MVN_X(N = Nex, B = Bex,
                       R = Rcov$R, S = Rcov$S,
                       dist = "poisson", print.out = TRUE)
mean(Pois.ex$Y)
quantile(Pois.ex$Y, 
         probs = c(0, 0.1, 0.25, 0.45, 0.5,
                   0.75, 0.9, 0.95, 0.99, 1))
hist(Pois.ex$Y)

[Package sim2Dpredictr version 0.1.1 Index]