R: Graph-constrained regression with penalty term being a linear...

riPEER {mdpeer}

R Documentation

Graph-constrained regression with penalty term being a linear combination of graph-based and ridge penalty terms

Description

Graph-constrained regression with penalty term being a linear combination of graph-based and ridge penalty terms.

See Details for model description and optimization problem formulation.

Usage

riPEER(Q, y, Z, X = NULL, optim.metod = "rootSolve",
  rootSolve.x0 = c(1e-05, 1e-05), rootSolve.Q0.x0 = 1e-05, sbplx.x0 = c(1,
  1), sbplx.lambda.lo = c(10^(-5), 10^(-5)), sbplx.lambda.up = c(1e+06,
  1e+06), compute.boot.CI = FALSE, boot.R = 1000, boot.conf = 0.95,
  boot.set.seed = TRUE, boot.parallel = "multicore", boot.ncpus = 4,
  verbose = TRUE)

Arguments

`Q`	graph-originated penalty matrix `(p \times p)`; typically: a graph Laplacian matrix
`y`	response values matrix `(n \times 1)`
`Z`	design matrix `(n \times p)` modeled as random effects variables (to be penalized in regression modeling); assumed to be already standarized
`X`	design matrix `(n \times k)` modeled as fixed effects variables (not to be penalized in regression modeling); if does not contain columns of 1s, such column will be added to be treated as intercept in a model
`optim.metod`	optimization method used to optimize `\lambda = (\lambda_Q, \lambda_R)` "rootSolve" (default) - optimizes by finding roots of non-linear equations by the Newton-Raphson method; from `rootSolve` package "sbplx" - optimizes with the use of Subplex Algorithm: 'Subplex is a variant of Nelder-Mead that uses Nelder-Mead on a sequence of subspaces'; from `nloptr` package
`rootSolve.x0`	vector containing initial guesses for `\lambda = (\lambda_Q, \lambda_R)` used in "rootSolve" algorithm
`rootSolve.Q0.x0`	vector containing initial guess for `\lambda_R` used in "rootSolve" algorithm
`sbplx.x0`	vector containing initial guesses for `\lambda = (\lambda_Q, \lambda_R)` used in "sbplx" algorithm
`sbplx.lambda.lo`	vector containing minimum values of `\lambda = (\lambda_Q, \lambda_R)` grid search in "sbplx" algorithm
`sbplx.lambda.up`	vector containing mximum values of `\lambda = (\lambda_Q, \lambda_R)` grid search in "sbplx" algorithm
`compute.boot.CI`	logical whether or not compute bootstrap confidence intervals for `b` regression coefficient estimates
`boot.R`	number of bootstrap replications used in bootstrap confidence intervals computation
`boot.conf`	confidence level assumed in bootstrap confidence intervals computation
`boot.set.seed`	logical whether or not set seed in bootstrap confidence intervals computation
`boot.parallel`	value of `parallel` argument in `boot` function in bootstrap confidence intervals computation
`boot.ncpus`	value of `ncpus` argument in `boot` function in bootstrap confidence intervals computation
`verbose`	logical whether or not set verbose mode (print out function execution messages)

Details

Estimates coefficients of linear model of the formula:

y = X\beta + Zb + \varepsilon

where:

y - response,
X - data matrix,
Z - data matrix,
\beta - regression coefficients, not penalized in estimation process,
b - regression coefficients, penalized in estimation process and for whom there is, possibly a prior graph of similarity / graph of connections available.

The method uses a penalty being a linear combination of a graph-based and ridge penalty terms:

\beta_{est}, b_{est}= arg \; min_{\beta,b} \{ (y - X\beta - Zb)^T(y - X\beta - Zb) + \lambda_Qb^TQb + \lambda_Rb^Tb \}

where:

Q - a graph-originated penalty matrix; typically: a graph Laplacian matrix,
\lambda_Q - regularization parameter for a graph-based penalty term
\lambda_R - regularization parameter for ridge penalty term

The two regularization parameters, \lambda_Q and \lambda_R, are estimated as ML estimators from equivalent Linear Mixed Model optimizaton problem formulation (see: References).

Graph-originated penalty term allows imposing similarity between coefficients based on graph information given.
Ridge-originated penalty term facilitates parameters estimation: it reduces computational issues arising from singularity in a graph-originated penalty matrix and yields plausible results in situations when graph information is not informative.

Bootstrap confidence intervals computation is available (not set as a default option).

Value

`b.est`	vector of `b` coefficient estimates
`beta.est`	vector of `\beta` coefficient estimates
`lambda.Q`	`\lambda_Q` regularization parameter value
`lambda.R`	`\lambda_R` regularization parameter value
`lambda.2`	`lambda.R`/`lambda.Q` value
`boot.CI`	data frame with two columns, `lower` and `upper`, containing, respectively, values of lower and upper bootstrap confidence intervals for `b` regression coefficient estimates
`obj.fn.val`	optimization problem objective function value

References

Karas, M., Brzyski, D., Dzemidzic, M., J., Kareken, D.A., Randolph, T.W., Harezlak, J. (2017). Brain connectivity-informed regularization methods for regression. doi: https://doi.org/10.1101/117945

Examples

set.seed(1234)
n <- 200
p1 <- 10
p2 <- 90
p <- p1 + p2
# Define graph adjacency matrix
A <- matrix(rep(0, p*p), nrow = p, ncol = p)
A[1:p1, 1:p1] <- 1
A[(p1+1):p, (p1+1):p] <- 1
L <- Adj2Lap(A)
# Define Q penalty matrix as graph Laplacian matrix normalized)
Q <- L2L.normalized(L)
# Define Z,X design matrices and aoutcome y
Z <- matrix(rnorm(n*p), nrow = n, ncol = p)
b.true <- c(rep(1, p1), rep(0, p2))
X <- matrix(rnorm(n*3), nrow = n, ncol = 3)
beta.true <- runif(3)
intercept <- 0
eta <- intercept + Z %*% b.true + X %*% beta.true
R2 <- 0.5
sd.eps <- sqrt(var(eta) * (1 - R2) / R2)
error <- rnorm(n, sd = sd.eps)
y <- eta + error

## Not run: 
riPEER.out <- riPEER(Q, y, Z, X)
plt.df <- data.frame(x = 1:p, y = riPEER.out$b.est)
ggplot(plt.df, aes(x = x, y = y, group = 1)) + geom_line() + labs("b estimates")

## End(Not run)

## Not run: 
# riPEER with 0.95 bootstrap confidence intervals computation
riPEER.out <- riPEER(Q, y, Z, X, compute.boot.CI = TRUE, boot.R = 500)
plt.df <- data.frame(x = 1:p, 
                     y = riPEER.out$b.est, 
                     lo = riPEER.out$boot.CI[,1], 
                     up =  riPEER.out$boot.CI[,2])
ggplot(plt.df, aes(x = x, y = y, group = 1)) + geom_line() +  
  geom_ribbon(aes(ymin=lo, ymax=up), alpha = 0.3)

## End(Not run)

[Package mdpeer version 1.0.1 Index]