R: Simulate Spatially Correlated MVN Data

sim_MVN_X {sim2Dpredictr}

R Documentation

Simulate Spatially Correlated MVN Data

Description

Takes N draws from a Multivariate Normal (MVN) distribution using either base R or the R package spam. This function requires the Cholesky decomposition of the desired covariance matrix.

Usage

sim_MVN_X(
  N,
  mu = 0,
  L = NULL,
  R = NULL,
  S = NULL,
  Q = NULL,
  use.spam = FALSE,
  use.MASS = FALSE,
  X.categorical = FALSE,
  X.num.categories = 2,
  X.category.type = "percentile",
  X.percentiles = NULL,
  X.manual.thresh = NULL,
  X.cat.names = NULL
)

Arguments

`N`	The number of draws to take from MVN; i.e., the number of subjects.
`mu`	One of the following: A single scalar value for common mean. A vector of length `nrow(R)` (equivalently `nrow(R)`) containing means for the MVN.
`L`, `R`	`L` and `R` are lower and upper triangular matrices, respectively, and are the Cholesky factor(s) of the desired covariance matrix for the MVN. Obtain `L` or `R` via `chol_s2Dp()` with settings `triangle = "lower"` or `triangle = "upper"`, respectively. Specify either `L` or `R`, but NOT both.
`S`, `Q`	A covariance or precision matrix respectively. These are for use with `spam`, and can be extracted from output of `chol_s2Dp` after choosing `return.cov = TRUE` or `return.prec = TRUE`, respectively.
`use.spam`	Logical. If `use.spam = TRUE` then use tools from the R package `spam`; otherwise, base R functions are employed. For large dimension MVN with sparse correlation structure, `spam` is recommended; otherwise, base R may be faster. Defaults to `FALSE`. Requires either the covariance matrix `S` or precision matrix, `Q`, that corresponds to the Cholesky factor.
`use.MASS`	Logical. When `TRUE` draws X from MVN using `mvrnorm` from `MASS`. Note that this requires specification of the covariance matrix, `S`. Specifying the precision matrix instead may slow down the process for large dimensions. Recommended to use `spam` to generate draws when specifying a precision matrix, `Q`.
`X.categorical`	Default is `X.categorical = FALSE`. If `X.categorical = TRUE` then thresholds are applied to categorize each predictor/image value.
`X.num.categories`	A scalar value denoting the number of categories in which to divide the data.
`X.category.type`	Tells R how to categorize the data. Options are `X.category.type = c("percentile", "manual")`. If `X.category.type = "percentile"` then the data are divided into percentiles based on `X.num.categories`; e.g. if `X.num.categories = 4` then the values are divided into quartiles, and values in Q1 equal 0, between Q1 and Q2 equal 1, between Q2 and Q3 equal 2, and greater than Q3 equal 3. If `X.category.type = "manual"` then specify the cutoff points with `X.manual.thresh`.
`X.percentiles`	A vector of percentiles to be used in thresholding when `X.categorical = TRUE` and `X.category.type = "percentile"`. The length of this vector should equal the number of categories minus one, and all values should be between zero and one.
`X.manual.thresh`	A vector containing the thresholds for categorizing the values; e.g. if `X.num.categories = 4` and `X.manual.thresh = c(-3, 1, 17)`, then values less than -3 are set to 0, equal or greater than -3 and less than 1 are set to 1, equal or greater than 1 but less than 17 are set to 2, and equal or greater than 17 are set to 3. Note that `length(X.manual.thresh)` must always equal `X.num.categories - 1`.
`X.cat.names`	A vector of category names. If `X.cat.names = NULL` then the initial integers assigned are left as the values; the names in `X.cat.names` are assigned in ascending order.

Value

Matrix of dimension N x (nrow(L)) (or equivalently N x (nrow(R))) where each row is draw from MVN, and each column represents a different "variable"; e.g. location in an image.

Note

This function requires the Cholesky decomposition of the desired covariance matrix for the MVN; this allows for using this function in simulating multiple datasets of N MVN draws while only taking the Cholesky decomposition of the covariance matrix once.

References

Furrer R, Sain SR (2010). “spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields.” Journal of Statistical Software, 36(10), 1-25. https://www.jstatsoft.org/v36/i10/.

Ripley BD (1987). Stochastic Simulation. John Wiley & Sons. doi:10.1002/9780470316726.

Rue H (2001). “Fast Sampling of Gaussian Markov Random Fields.” Journal of the Royal Statistical Society B, 63, 325-338. doi:10.1111/1467-9868.00288.

Examples

## verify MVN with base R
set.seed(732)
Lex <- chol_s2Dp(corr.structure = "ar1",
                 im.res = c(3, 3), 
                 rho = 0.25,
                 sigma = 1, 
                 use.spam = FALSE, 
                 corr.min = 0.02,
                 triangle = "lower", 
                 return.cov = TRUE)
XbR = sim_MVN_X(N = 1000, mu = 0, L = Lex$L)

apply(XbR, 2, mean)
cov(XbR)
Lex$S

## verify MVN with \code{spam}
set.seed(472)
Rex <- chol_s2Dp(im.res = c(3, 3), matrix.type = "prec",
                use.spam = TRUE, neighborhood = "ar1",
                triangle = "upper", return.prec = TRUE)

Xspam = sim_MVN_X(N = 1000, mu = 0, R = Rex$R, Q = Rex$Q)

apply(Xspam, 2, mean)
solve(cov(Xspam))
as.matrix(Rex$Q)

## Categories
set.seed(832)
Xtest <- sim_MVN_X(N = 30, mu = 0, L = Lex$L,
                   X.categorical = TRUE,
                   X.num.categories = 3,
                   X.category.type = "percentile",
                   X.cat.names = c("A", "B", "C"))
Xtest

[Package sim2Dpredictr version 0.1.1 Index]