nbfar_sim {nbfar}R Documentation

Simulated data for testing NBFAR and NBRRR model

Description

Simulate response and covariates for multivariate negative binomial regression with a low-rank and sparse coefficient matrix. Coefficient matrix is expressed in terms of U (left singular vector), D (singular values) and V (right singular vector).

Usage

nbfar_sim(U, D, V, n, Xsigma, C0, disp, depth)

Arguments

U

specified value of U

D

specified value of D

V

specified value of V

n

sample size

Xsigma

covariance matrix used to generate predictors in X

C0

intercept value in the coefficient matrix

disp

dispersion parameter of the generative model

depth

log of the sequencing depth of the microbiome data (used as an offset in the simulated multivariate negative binomial regression model)

Value

Y

Generated response matrix

X

Generated predictor matrix

References

Mishra, A., Müller, C. (2022) Negative binomial factor regression models with application to microbiome data analysis. https://doi.org/10.1101/2021.11.29.470304

Examples

## Model specification:
SD <- 123
set.seed(SD)
p <- 100; n <- 200
pz <- 0
nrank <- 3                # true rank
rank.est <- 5             # estimated rank
nlam <- 20                # number of tuning parameter
s  = 0.5
q <- 30
control <- nbfar_control()  # control parameters
#
#
## Generate data
D <- rep(0, nrank)
V <- matrix(0, ncol = nrank, nrow = q)
U <- matrix(0, ncol = nrank, nrow = p)
#
U[, 1] <- c(sample(c(1, -1), 8, replace = TRUE), rep(0, p - 8))
U[, 2] <- c(rep(0, 5), sample(c(1, -1), 9, replace = TRUE), rep(0, p - 14))
U[, 3] <- c(rep(0, 11), sample(c(1, -1), 9, replace = TRUE), rep(0, p - 20))
#
  # for similar type response type setting
  V[, 1] <- c(rep(0, 8), sample(c(1, -1), 8,
    replace =
      TRUE
  ) * runif(8, 0.3, 1), rep(0, q - 16))
  V[, 2] <- c(rep(0, 20), sample(c(1, -1), 8,
    replace =
      TRUE
  ) * runif(8, 0.3, 1), rep(0, q - 28))
  V[, 3] <- c(
    sample(c(1, -1), 5, replace = TRUE) * runif(5, 0.3, 1), rep(0, 23),
    sample(c(1, -1), 2, replace = TRUE) * runif(2, 0.3, 1), rep(0, q - 30)
  )
U[, 1:3] <- apply(U[, 1:3], 2, function(x) x / sqrt(sum(x^2)))
V[, 1:3] <- apply(V[, 1:3], 2, function(x) x / sqrt(sum(x^2)))
#
D <- s * c(4, 6, 5) # signal strength varries as per the value of s
or <- order(D, decreasing = TRUE)
U <- U[, or]
V <- V[, or]
D <- D[or]
C <- U %*% (D * t(V)) # simulated coefficient matrix
intercept <- rep(0.5, q) # specifying intercept to the model:
C0 <- rbind(intercept, C)
#
Xsigma <- 0.5^abs(outer(1:p, 1:p, FUN = "-"))
# Simulated data
sim.sample <- nbfar_sim(U, D, V, n, Xsigma, C0,disp = 3, depth = 10)  # Simulated sample
# Dispersion parameter
X <- sim.sample$X[1:n, ]
Y <- sim.sample$Y[1:n, ]


[Package nbfar version 0.1 Index]