R: Nuclear penalized multinomial regression

npmr {npmr}

R Documentation

Nuclear penalized multinomial regression

Description

Fit a multinomial logistic regression model for a sequence of regularization parameters penalizing the nuclear norm of the regression coefficient matrix.

Usage

npmr(X, Y, lambda = exp(seq(7, -2)), s = 0.1/max(X), eps = 1e-06, group = NULL,
    accelerated = TRUE, B.init = NULL, b.init = NULL, quiet = TRUE)

Arguments

`X`	Covariate matrix. May be in sparse form from `Matrix` package
`Y`	Multinomial reponse. May be (1) a vector of length equal to nrow(X), which will be interpreted as a factor, with levels representing response classes; or (2) a matrix with nrow(Y) = nrow(X), and each row has exactly one 1 representing the response class for that observation, with the remaining entries of the row being zero.
`lambda`	Vector of regularization parameter values for penalizing nuclear norm. Default is a wide range of values. We suggest that the user choose this sequence via trial and error. If the model takes too long to fit, try larger values of lambda.
`s`	Step size for proximal gradient descent
`eps`	Convergence threshold. When relative change in the objective function after an interation drops below this threshold, algorithm halts.
`group`	Vector of length equal to number of variables (ncol(X) and nrow(B)). Variables in the same group indexed by a POSITIVE integer will be penalized together (the nuclear norm of the sub-matrix of the regression coefficients will be penalized). Variables without positive integers will NOT be penalized. Default is NULL, which means there are no sub-groups; nuclear norm of entire coefficient matrix is penalized.
`accelerated`	Logical. Should accelerated proximal gradient descent be used? Default is TRUE.
`B.init`	Initial value of the regression coefficient matrix for proximal gradient descent
`b.init`	Initial value of the regression intercept vector for proximal gradient descent
`quiet`	Logical. Should output be silenced? If not, print the value of the objective function after each step of proximal gradient descent. Perhaps useful for debugging. Default is TRUE.

Details

In multinomial regression (in contrast with Gaussian regression or logistic regression) there is a matrix of regression coefficients, not just a vector. NPMR fits a logistic multinomial regression with a penalty on the nuclear norm of this regression coefficient matrix B. Specifically, the objective is

-loglik(B,b|X,Y) + lambda*||B||_*

where ||.||_* denotes the nuclear norm. This implementation solves the problem using proximal gradient descent, which iteratively steps in the direction of the negative gradient of the loss function and soft-thresholds the singular values of the result.

This function makes available the option, through the groups argument, of dividing the regression coefficient matrix into sub-matrices (by row) and penalizing the sum of the nuclear norms of these submatrices. Rows (correspond to variables) can be given no penalty in this way.

Value

An object of class “npmr” with values:

`call`	the call that produced this object
`B`	A 3-dimensional array, with dimensions (`ncol(X), ncol(Y), length(lambda)`). For each lambda, this array stores the regression coefficient matrix which solves the NPMR optimization problem for that value of lambda.
`b`	A matrix with `ncol(Y)` rows and `length(lambda)` columns. Each column stores the regression intercept vector which solves the NPMR optimization problem for that value of lambda.
`objective`	A vector of length equal to the length of `lambda`, giving the value value of the objective for the solution corresponding to each value of lambda.
`lambda`	The input sequence of values for the regularization parameter, for each of which NPMR has been solved.

Author(s)

Scott Powers, Trevor Hastie, Rob Tibshirani

References

Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.

Examples

#   Fit NPMR to simulated data

K = 5
n = 1000
m = 10000
p = 10
r = 2

# Simulated training data
set.seed(8369)
A = matrix(rnorm(p*r), p, r)
C = matrix(rnorm(K*r), K, r)
B = tcrossprod(A, C)            # low-rank coefficient matrix
X = matrix(rnorm(n*p), n, p)    # covariate matrix with iid Gaussian entries
eta = X 
P = exp(eta)/rowSums(exp(eta))
Y = t(apply(P, 1, rmultinom, n = 1, size = 1))

# Simulate test data
Xtest = matrix(rnorm(m*p), m, p)
etatest = Xtest 
Ptest = exp(etatest)/rowSums(exp(etatest))
Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1))

# Fit NPMR for a sequence of lambda values without CV:
fit2 = npmr(X, Y, lambda = exp(seq(7, -2)))

# Print the NPMR fit:
fit2

# Produce a biplot:
plot(fit2, lambda = 20)

# Compute mean test error using the predict function (for each value of lambda):
getloss = function(pred, Y) {
    -mean(log(rowSums(Y*pred)))
}
apply(predict(fit2, Xtest), 3, getloss, Ytest)

[Package npmr version 1.3.1 Index]