R: Negative binomial reduced rank regression (NBRRR)

nbrrr {nbfar}

R Documentation

Negative binomial reduced rank regression (NBRRR)

Description

In the range of 1 to maxrank, the estimation procedure selects the rank r of the coefficient matrix using a cross-validation approach. For the selected rank, a rank r coefficient matrix is estimated that best fits the observations.

Usage

nbrrr(
  Yt,
  X,
  maxrank = 10,
  cIndex = NULL,
  ofset = NULL,
  control = list(),
  nfold = 5,
  trace = FALSE,
  verbose = TRUE
)

Arguments

`Yt`	response matrix
`X`	design matrix; when X = NULL, we set X as identity matrix and perform generalized PCA.
`maxrank`	an integer specifying the maximum possible rank of the coefficient matrix or the number of factors
`cIndex`	specify index of control variable in the design matrix X
`ofset`	offset matrix or microbiome data analysis specific scaling: common sum scaling = CSS (default), total sum scaling = TSS, median-ratio scaling = MRS, centered-log-ratio scaling = CLR
`control`	a list of internal parameters controlling the model fitting
`nfold`	number of folds in k-fold crossvalidation
`trace`	TRUE/FALSE checking progress of cross validation error
`verbose`	TRUE/FALSE checking progress of estimation procedure

Value

`C`	estimated coefficient matrix
`Z`	estimated control variable coefficient matrix
`PHI`	estimted dispersion parameters
`U`	estimated U matrix (generalize latent factor weights)
`D`	estimated singular values
`V`	estimated V matrix (factor loadings)

References

Mishra, A., Müller, C. (2022) Negative binomial factor regression models with application to microbiome data analysis. https://doi.org/10.1101/2021.11.29.470304

Examples


## Model specification:
SD <- 123
set.seed(SD)
p <- 50; n <- 200
pz <- 0
nrank <- 3                # true rank
rank.est <- 5             # estimated rank
nlam <- 20                # number of tuning parameter
s  = 0.5
q <- 30
control <- nbfar_control()  # control parameters
#
#
## Generate data
D <- rep(0, nrank)
V <- matrix(0, ncol = nrank, nrow = q)
U <- matrix(0, ncol = nrank, nrow = p)
#
U[, 1] <- c(sample(c(1, -1), 8, replace = TRUE), rep(0, p - 8))
U[, 2] <- c(rep(0, 5), sample(c(1, -1), 9, replace = TRUE), rep(0, p - 14))
U[, 3] <- c(rep(0, 11), sample(c(1, -1), 9, replace = TRUE), rep(0, p - 20))
#
  # for similar type response type setting
  V[, 1] <- c(rep(0, 8), sample(c(1, -1), 8,
    replace =
      TRUE
  ) * runif(8, 0.3, 1), rep(0, q - 16))
  V[, 2] <- c(rep(0, 20), sample(c(1, -1), 8,
    replace =
      TRUE
  ) * runif(8, 0.3, 1), rep(0, q - 28))
  V[, 3] <- c(
    sample(c(1, -1), 5, replace = TRUE) * runif(5, 0.3, 1), rep(0, 23),
    sample(c(1, -1), 2, replace = TRUE) * runif(2, 0.3, 1), rep(0, q - 30)
  )
U[, 1:3] <- apply(U[, 1:3], 2, function(x) x / sqrt(sum(x^2)))
V[, 1:3] <- apply(V[, 1:3], 2, function(x) x / sqrt(sum(x^2)))
#
D <- s * c(4, 6, 5) # signal strength varries as per the value of s
or <- order(D, decreasing = TRUE)
U <- U[, or]
V <- V[, or]
D <- D[or]
C <- U %*% (D * t(V)) # simulated coefficient matrix
intercept <- rep(0.5, q) # specifying intercept to the model:
C0 <- rbind(intercept, C)
#
Xsigma <- 0.5^abs(outer(1:p, 1:p, FUN = "-"))
# Simulated data
sim.sample <- nbfar_sim(U, D, V, n, Xsigma, C0,disp = 3, depth = 10)  # Simulated sample
# Dispersion parameter
X <- sim.sample$X[1:n, ]
Y <- sim.sample$Y[1:n, ]
X0 <- cbind(1, X)                     # 1st column accounting for intercept

# Model with known offset
set.seed(1234)
offset <- log(10)*matrix(1,n,q)
control_nbrr <- nbfar_control(initmaxit = 5000, initepsilon = 1e-4)
# nbrrr_test <- nbrrr(Y, X, maxrank = 5, cIndex = NULL, ofset = offset,
#                       control = control_nbrr, nfold = 5)

[Package nbfar version 0.1 Index]