sim_MVN_X {sim2Dpredictr}R Documentation

Simulate Spatially Correlated MVN Data

Description

Takes N draws from a Multivariate Normal (MVN) distribution using either base R or the R package spam. This function requires the Cholesky decomposition of the desired covariance matrix.

Usage

sim_MVN_X(
  N,
  mu = 0,
  L = NULL,
  R = NULL,
  S = NULL,
  Q = NULL,
  use.spam = FALSE,
  use.MASS = FALSE,
  X.categorical = FALSE,
  X.num.categories = 2,
  X.category.type = "percentile",
  X.percentiles = NULL,
  X.manual.thresh = NULL,
  X.cat.names = NULL
)

Arguments

N

The number of draws to take from MVN; i.e., the number of subjects.

mu

One of the following:

  • A single scalar value for common mean.

  • A vector of length nrow(R) (equivalently nrow(R)) containing means for the MVN.

L, R

L and R are lower and upper triangular matrices, respectively, and are the Cholesky factor(s) of the desired covariance matrix for the MVN. Obtain L or R via chol_s2Dp() with settings triangle = "lower" or triangle = "upper", respectively. Specify either L or R, but NOT both.

S, Q

A covariance or precision matrix respectively. These are for use with spam, and can be extracted from output of chol_s2Dp after choosing return.cov = TRUE or return.prec = TRUE, respectively.

use.spam

Logical. If use.spam = TRUE then use tools from the R package spam; otherwise, base R functions are employed. For large dimension MVN with sparse correlation structure, spam is recommended; otherwise, base R may be faster. Defaults to FALSE. Requires either the covariance matrix S or precision matrix, Q, that corresponds to the Cholesky factor.

use.MASS

Logical. When TRUE draws X from MVN using mvrnorm from MASS. Note that this requires specification of the covariance matrix, S. Specifying the precision matrix instead may slow down the process for large dimensions. Recommended to use spam to generate draws when specifying a precision matrix, Q.

X.categorical

Default is X.categorical = FALSE. If X.categorical = TRUE then thresholds are applied to categorize each predictor/image value.

X.num.categories

A scalar value denoting the number of categories in which to divide the data.

X.category.type

Tells R how to categorize the data. Options are X.category.type = c("percentile", "manual"). If X.category.type = "percentile" then the data are divided into percentiles based on X.num.categories; e.g. if X.num.categories = 4 then the values are divided into quartiles, and values in Q1 equal 0, between Q1 and Q2 equal 1, between Q2 and Q3 equal 2, and greater than Q3 equal 3. If X.category.type = "manual" then specify the cutoff points with X.manual.thresh.

X.percentiles

A vector of percentiles to be used in thresholding when X.categorical = TRUE and X.category.type = "percentile". The length of this vector should equal the number of categories minus one, and all values should be between zero and one.

X.manual.thresh

A vector containing the thresholds for categorizing the values; e.g. if X.num.categories = 4 and X.manual.thresh = c(-3, 1, 17), then values less than -3 are set to 0, equal or greater than -3 and less than 1 are set to 1, equal or greater than 1 but less than 17 are set to 2, and equal or greater than 17 are set to 3. Note that length(X.manual.thresh) must always equal X.num.categories - 1.

X.cat.names

A vector of category names. If X.cat.names = NULL then the initial integers assigned are left as the values; the names in X.cat.names are assigned in ascending order.

Value

Matrix of dimension N x (nrow(L)) (or equivalently N x (nrow(R))) where each row is draw from MVN, and each column represents a different "variable"; e.g. location in an image.

Note

This function requires the Cholesky decomposition of the desired covariance matrix for the MVN; this allows for using this function in simulating multiple datasets of N MVN draws while only taking the Cholesky decomposition of the covariance matrix once.

References

Furrer R, Sain SR (2010). “spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields.” Journal of Statistical Software, 36(10), 1-25. https://www.jstatsoft.org/v36/i10/.

Ripley BD (1987). Stochastic Simulation. John Wiley & Sons. doi:10.1002/9780470316726.

Rue H (2001). “Fast Sampling of Gaussian Markov Random Fields.” Journal of the Royal Statistical Society B, 63, 325-338. doi:10.1111/1467-9868.00288.

Examples

## verify MVN with base R
set.seed(732)
Lex <- chol_s2Dp(corr.structure = "ar1",
                 im.res = c(3, 3), 
                 rho = 0.25,
                 sigma = 1, 
                 use.spam = FALSE, 
                 corr.min = 0.02,
                 triangle = "lower", 
                 return.cov = TRUE)
XbR = sim_MVN_X(N = 1000, mu = 0, L = Lex$L)

apply(XbR, 2, mean)
cov(XbR)
Lex$S

## verify MVN with \code{spam}
set.seed(472)
Rex <- chol_s2Dp(im.res = c(3, 3), matrix.type = "prec",
                use.spam = TRUE, neighborhood = "ar1",
                triangle = "upper", return.prec = TRUE)

Xspam = sim_MVN_X(N = 1000, mu = 0, R = Rex$R, Q = Rex$Q)

apply(Xspam, 2, mean)
solve(cov(Xspam))
as.matrix(Rex$Q)

## Categories
set.seed(832)
Xtest <- sim_MVN_X(N = 30, mu = 0, L = Lex$L,
                   X.categorical = TRUE,
                   X.num.categories = 3,
                   X.category.type = "percentile",
                   X.cat.names = c("A", "B", "C"))
Xtest


[Package sim2Dpredictr version 0.1.1 Index]