generateData {covdepGE}R Documentation

Generate Covariate-Dependent Data

Description

Generate a 11-dimensional extraneous covariate and pp-dimensional Gaussian data with a precision matrix that varies as a continuous function of the extraneous covariate. This data is distributed similar to that used in the simulation study from (1)

Usage

generateData(p = 5, n1 = 60, n2 = 60, n3 = 60, Z = NULL, true_precision = NULL)

Arguments

p

positive integer; number of variables in the data matrix. 5 by default

n1

positive integer; number of observations in the first interval. 60 by default

n2

positive integer; number of observations in the second interval. 60 by default

n3

positive integer; number of observations in the third interval. 60 by default

Z

NULL or numeric vector; extraneous covariate values for each observation. If NULL, Z will be generated from a uniform distribution on each of the intervals

true_precision

NULL OR list of matrices of dimension p×pp \times p; true precision matrix for each observation. If NULL, the true precision matrices will be generated dependent on Z. NULL by default

Value

Returns list with the following values:

X

a (n1 + n2 + n3) ×p\times p numeric matrix, where the ii-th row is drawn from a pp-dimensional Gaussian with mean 00 and precision matrix true_precision[[i]]

Z

a (n1 + n2 + n3) ×1\times 1 numeric matrix, where the ii-th entry is the extraneous covariate ziz_i for observation ii

true_precision

list of n1 + n2 + n3 matrices of dimension p×pp \times p; the ii-th matrix is the precision matrix for the ii-th observation

interval

vector of length n1 + n2 + n3; interval assignments for each of the observations, where the ii-th entry is the interval assignment for the ii-th observation

Extraneous Covariate

If Z = NULL, then the generation of Z is as follows:

The first n1 observations have ziz_i from from a uniform distribution on the interval (3,1)(-3, -1) (the first interval).

Observations n1 + 1 to n1 + n2 have ziz_i from from a uniform distribution on the interval (1,1)(-1, 1) (the second interval).

Observations n1 + n2 + 1 to n1 + n2 + n3 have ziz_i from a uniform distribution on the interval (1,3)(1, 3) (the third interval).

Precision Matrices

If true_precision = NULL, then the generation of the true precision matrices is as follows:

All precision matrices have 22 on the diagonal and 11 in the (2,3)/(3,2)(2, 3)/ (3, 2) positions.

Observations in the first interval have a 11 in the (1,2)/(1,2)(1, 2) / (1, 2) positions, while observations in the third interval have a 11 in the (1,3)/(3,1)(1, 3)/ (3, 1) positions.

Observations in the second interval have 22 entries that vary as a linear function of their extraneous covariate. Let β=1/2\beta = 1/2. Then, the (1,2)/(2,1)(1, 2)/(2, 1) positions for the ii-th observation in the second interval are β(1zi)\beta\cdot(1 - z_i), while the (1,3)/(3,1)(1, 3)/ (3, 1) entries are β(1+zi)\beta\cdot(1 + z_i).

Thus, as ziz_i approaches 1-1 from the right, the associated precision matrix becomes more similar to the matrix for observations in the first interval. Similarly, as ziz_i approaches 11 from the left, the matrix becomes more similar to the matrix for observations in the third interval.

Examples

## Not run: 
library(ggplot2)

# get the data
set.seed(12)
data <- generateData()
X <- data$X
Z <- data$Z
interval <- data$interval
prec <- data$true_precision

# get overall and within interval sample sizes
n <- nrow(X)
n1 <- sum(interval == 1)
n2 <- sum(interval == 2)
n3 <- sum(interval == 3)

# visualize the distribution of the extraneous covariate
ggplot(data.frame(Z = Z, interval = as.factor(interval))) +
  geom_histogram(aes(Z, fill = interval), color = "black", bins = n %/% 5)

# visualize the true precision matrices in each of the intervals

# interval 1
matViz(prec[[1]], incl_val = TRUE) +
  ggtitle(paste0("True precision matrix, interval 1, observations 1,...,", n1))

# interval 2 (varies continuously with Z)
cat("\nInterval 2, observations ", n1 + 1, ",...,", n1 + n2, sep = "")
int2_mats <- prec[interval == 2]
int2_inds <- c(5, n2 %/% 2, n2 - 5)
lapply(int2_inds, function(j) matViz(int2_mats[[j]], incl_val = TRUE) +
         ggtitle(paste("True precision matrix, interval 2, observation", j + n1)))

# interval 3
matViz(prec[[length(prec)]], incl_val = TRUE) +
  ggtitle(paste0("True precision matrix, interval 3, observations ",
                 n1 + n2 + 1, ",...,", n1 + n2 + n3))

# fit the model and visualize the estimated graphs
(out <- covdepGE(X, Z))
plot(out)

# visualize the posterior inclusion probabilities for variables (1, 3) and (1, 2)
inclusionCurve(out, 1, 2)
inclusionCurve(out, 1, 3)

## End(Not run)

[Package covdepGE version 1.0.1 Index]