nbfar_sim {nbfar} | R Documentation |
Simulated data for testing NBFAR and NBRRR model
Description
Simulate response and covariates for multivariate negative binomial regression with a low-rank and sparse coefficient matrix. Coefficient matrix is expressed in terms of U (left singular vector), D (singular values) and V (right singular vector).
Usage
nbfar_sim(U, D, V, n, Xsigma, C0, disp, depth)
Arguments
U |
specified value of U |
D |
specified value of D |
V |
specified value of V |
n |
sample size |
Xsigma |
covariance matrix used to generate predictors in X |
C0 |
intercept value in the coefficient matrix |
disp |
dispersion parameter of the generative model |
depth |
log of the sequencing depth of the microbiome data (used as an offset in the simulated multivariate negative binomial regression model) |
Value
Y |
Generated response matrix |
X |
Generated predictor matrix |
References
Mishra, A., Müller, C. (2022) Negative binomial factor regression models with application to microbiome data analysis. https://doi.org/10.1101/2021.11.29.470304
Examples
## Model specification:
SD <- 123
set.seed(SD)
p <- 100; n <- 200
pz <- 0
nrank <- 3 # true rank
rank.est <- 5 # estimated rank
nlam <- 20 # number of tuning parameter
s = 0.5
q <- 30
control <- nbfar_control() # control parameters
#
#
## Generate data
D <- rep(0, nrank)
V <- matrix(0, ncol = nrank, nrow = q)
U <- matrix(0, ncol = nrank, nrow = p)
#
U[, 1] <- c(sample(c(1, -1), 8, replace = TRUE), rep(0, p - 8))
U[, 2] <- c(rep(0, 5), sample(c(1, -1), 9, replace = TRUE), rep(0, p - 14))
U[, 3] <- c(rep(0, 11), sample(c(1, -1), 9, replace = TRUE), rep(0, p - 20))
#
# for similar type response type setting
V[, 1] <- c(rep(0, 8), sample(c(1, -1), 8,
replace =
TRUE
) * runif(8, 0.3, 1), rep(0, q - 16))
V[, 2] <- c(rep(0, 20), sample(c(1, -1), 8,
replace =
TRUE
) * runif(8, 0.3, 1), rep(0, q - 28))
V[, 3] <- c(
sample(c(1, -1), 5, replace = TRUE) * runif(5, 0.3, 1), rep(0, 23),
sample(c(1, -1), 2, replace = TRUE) * runif(2, 0.3, 1), rep(0, q - 30)
)
U[, 1:3] <- apply(U[, 1:3], 2, function(x) x / sqrt(sum(x^2)))
V[, 1:3] <- apply(V[, 1:3], 2, function(x) x / sqrt(sum(x^2)))
#
D <- s * c(4, 6, 5) # signal strength varries as per the value of s
or <- order(D, decreasing = TRUE)
U <- U[, or]
V <- V[, or]
D <- D[or]
C <- U %*% (D * t(V)) # simulated coefficient matrix
intercept <- rep(0.5, q) # specifying intercept to the model:
C0 <- rbind(intercept, C)
#
Xsigma <- 0.5^abs(outer(1:p, 1:p, FUN = "-"))
# Simulated data
sim.sample <- nbfar_sim(U, D, V, n, Xsigma, C0,disp = 3, depth = 10) # Simulated sample
# Dispersion parameter
X <- sim.sample$X[1:n, ]
Y <- sim.sample$Y[1:n, ]