R: Fitting Second-order Tensor Generalized Regression

tensorReg2D {TensorTest2D}

R Documentation

Fitting Second-order Tensor Generalized Regression

Description

tensorReg2D is used to fit second-order tensor generalized regression model. It mainly focus on parameter estimation, including parameter coefficients and standard deviation. The function is built upon Alternating Least Square Algorithm, so we provide two criterion to determine optimal result (see more details below in Arguments). Also, we offer model complexity measurement,including AIC and BIC.

Usage

tensorReg2D(y, X, W = NULL, n_R, family, opt = 1, max_ite = 100, tol = 10^(-7))

Arguments

`y`	A numerical vector. Dependent variable.
`X`	A numerical 3-D array Independent variable(3-D tensor).
`W`	A numerical matrix. Independent variable.
`n_R`	A numerical constant. A predefined value determines the rank of the approximate matrix
`family`	Family of `generalized linear model`. Provide three options for model.(see more details in Details)
`opt`	Optimization options. Provide two options for optimization stopping criterion. opt = 1 or 2. (see more details in Details)
`max_ite`	Maximum iteration. The value of maximum iterations for the algorithm.
`tol`	Tolerance. The value of tolerance with respect to optimization.

Details

tensorReg2D focuses on second-order tensor generalized regression problems. To be more specific, it provides statistical inference for input variables. Moreover, the function isn't restricted to second-order tensor input X; it could combine with other meaningful numerical variables W.

Since tensorReg2D is based on Alternating Least Square Algorithm, we need to pre-define following arguments to meet favorable optimization result.

n_R: In the case of regression with the order 2, P-by-G-by-n tensor, we can break a unknown parameter matrix B(P-by-G) into multiplication of two matrix B_1(P-by-R) and t(B_2) (R-by-G), which means that we can estimate the original matrix B by iteratively updating B_1 and B_2. In this scenario, n_R equals to the rank of these two approximate matrix B_1 and B_2. Conceivably, 1 <= n_R <= min(P,G), and by properly pre-appointing n_R, we can estimate a unknown parameter matrix. By default, n_R = 1.

opt: In optimization algorithm, we have to determine stopping criterion. In tensorReg2D, we offer two criteria. If opt = 1, the criterion is that we stop our execution when the maximum difference between the elements among an estimated parameter matrix B with an estimated parameter vector b and preceding ones is less than predefined tolerance (tol) . If opt = 2, the criterion is that we stop our execution when the maximum difference between the elements among an estimated approximate parameter matrix B_1 , B_2 with an estimated parameter vector b and preceding ones is less than predefined tolerance (tol).

family: In tensorReg2D, we provide three options for specific generalized regression problem. First, family = "gaussian" using identity link function corresponds to linear regression model, where dependent variable is real number. Next, family = "binomial" based on logit link function corresponds to logistic regression, where dependent variable is restricted to zero or one binary variable. Finally, family = "poisson" built upon log link function corresponds to poisson regression, where dependent variable is non-negative integer.

max_ite: In optimization algorithm, we have to beforehand determine maximum iteration beforehand. By default, max_ite = 100.

tol: In optimization algorithm, we have to beforehand determine maximum tolerance to cooperate with stopping criterion(opt).

Value

tensorReg2D returns an object of "tsglm".

The function, summary.tsglm a customized method from generic function summary, can be used to obtain and print a summary and analysis of variance table of the results.

An object of class tsglm is a list containing at least the following components:

ite: The number of executed times when stopping the function.

b_EST: The estimated coefficients for numerical variables.

b_SD: The estimated standard deviation for numerical variables.

b_PV: The p-value for numerical variables.

B_EST: The estimated coefficients for 3-D tensor variables.

B_SD: The estimated standard deviation for 3-D tensor variables.

B_PV: The p-value for 3-D tensor variables.

Residuals: The differences between true values and prediction values. Provide for family = "gaussian".

Dev_res: Deviance residuals for glm. Provide for model except family = "gaussian".

Dev: The value of Null deviances and Residual deviance. Provide for model except family = "gaussian".

IC: The value of AIC and BIC.

DoF: Degree of freedom.

call: The formula of fitted model.

family: The family for model.

Author(s)

Sheng-Mao Chang

References

Mengyun Wu, Jian Huang, and Shuangge Ma (2017). Identifying gene-gene interactions using penalized tensor regression.

Sheng-Mao Chang, Meng Yang, Wenbin Lu, Yu-Jyun Huang, Yueyang Huang, Hung Hung, Jeffrey C Miecznikowski, Tzu-Pin Lu, Jung-Ying Tzeng, Gene-set integrative analysis of multi-omics data using tensor-based association test, Bioinformatics, 2021;, btab125, (Link))

Examples

# Simulation data
n <- 500 # number of observations
n_P <- 3; n_G <- 64 # dimension of 3-D tensor variables.
n_d <- 1 # number of numerical variable, if n_d == 1,  numerical variable equals to intercept.
beta_True <- rep(1, n_d)
B_True <- c(1,1,1)%*%t(rnorm(n_G)) + c(0, .5, .5)%*%t(rnorm(n_G))
B_True <- B_True / 10
W <- matrix(rnorm(n*n_d), n, n_d); W[,1] <- 1
X <- array(rnorm(n*n_P*n_G), dim=c(n_P, n_G, n))
## Regression
y_R<- as.vector(W%*%beta_True + X%hp%B_True + rnorm(n))
DATA_R <- list(y = y_R, X = X, W = W)
## Binomial
p_B <- exp(W%*%beta_True + X%hp%B_True); p_B <- p_B/(1+p_B)
y_B <- rbinom(n, 1, p_B)
DATA_B <- list(y = y_B, W = W, X = X)
## Poisson
p_P <- exp(W%*%beta_True + X%hp%B_True)
y_P <- rpois(n, p_P)
y_P[which(y_P > 170)] <- 170 # If y_P > 170, factorial(y_P) == inf.
DATA_P <- list(y = y_P, W = W, X = X)

# Execution
## Regression
result_R <- tensorReg2D(y = DATA_R$y, X = DATA_R$X, W=NULL, n_R = 1, family = "gaussian",
opt = 1, max_ite = 100, tol = 10^(-7) )
## Visualization
image(B_True);image(result_R$B_EST)
head(predict(result_R, DATA_R$X))

## Binomial
result_B <- tensorReg2D(y = DATA_B$y, X = DATA_B$X, W=NULL, n_R = 1, family = "binomial",
opt = 1, max_ite = 100, tol = 10^(-7) )
## Visualization
image(B_True);image(result_B$B_EST)
head(predict(result_B, DATA_B$X))

## Poisson
result_P <- tensorReg2D(y = DATA_P$y, X = DATA_P$X, W=NULL, n_R = 1, family = "poisson",
opt = 1, max_ite = 100, tol = 10^(-7) )
## Visualization
image(B_True);image(result_P$B_EST)
head(predict(result_P, DATA_P$X))

[Package TensorTest2D version 1.1.2 Index]