R: Calculate the coefficients for the basis functions

sieve_solver {Sieve}

R Documentation

Calculate the coefficients for the basis functions

Description

This is the main function that performs sieve estimation. It calculate the coefficients by solving a penalized lasso type problem.

Usage

sieve_solver(
  model,
  Y,
  l1 = TRUE,
  family = "gaussian",
  lambda = NULL,
  nlambda = 100
)

Arguments

`model`	a list. Typically, it is the output of Sieve::sieve_preprocess.
`Y`	a vector. The outcome variable. The length of Y equals to the training sample size, which should also match the row number of X in model.
`l1`	a logical variable. TRUE means calculating the coefficients by sovling a l1-penalized empirical risk minimization problem. FALSE means solving a least-square problem. Default is TRUE.
`family`	a string. 'gaussian', mean-squared-error regression problem.
`lambda`	same as the lambda of glmnet::glmnet.
`nlambda`	a number. Number of penalization hyperparameter used when solving the lasso-type problem. Default is 100.

Value

a list. In addition to the preprocessing information, it also has the fitted value.

`Phi`	a matrix. This is the design matrix directly used by the next step model fitting. The (i,j)-th element of this matrix is the evaluation of i-th sample's feature at the j-th basis function. The dimension of this matrix is sample size x basisN.
`X`	a matrix. This is the rescaled original feature/predictor matrix.
`beta_hat`	a matrix. Dimension is basisN x nlambda. The j-th column corresponds to the fitted regression coeffcients using the j-th hyperparameter in lambda.
`type`	a string. The type of basis funtion.
`index_matrix`	a matrix. It specifies what are the product basis functions used when constructing the design matrix Phi. It has a dimension basisN x dimension of original features. There are at most interaction_order many non-1 elements in each row.
`basisN`	a number. Number of sieve basis functions.
`norm_para`	a matrix. It records how each dimension of the feature/predictor is rescaled, which is useful when rescaling the testing sample's predictors.
`lambda`	a vector. It records the penalization hyperparameter used when solving the lasso problems. Default has a length of 100, meaning the algorithm tried 100 different penalization hyperparameters.
`family`	a string. 'gaussian', continuous numerical outcome, regression probelm; 'binomial', binary outcome, classification problem.

Examples

xdim <- 1 #1 dimensional feature
#generate 1000 training samples
TrainData <- GenSamples(s.size = 1000, xdim = xdim)
#use 50 cosine basis functions
type <- 'cosine'
basisN <- 50 
sieve.model <- sieve_preprocess(X = TrainData[,2:(xdim+1)], 
                                basisN = basisN, type = type)
sieve.fit<- sieve_solver(model = sieve.model, Y = TrainData$Y)

###if the outcome is binary, 
###need to solve a nonparametric logistic regression problem
xdim <- 1
TrainData <- GenSamples(s.size = 1e3, xdim = xdim, y.type = 'binary', frho = 'nonlinear_binary')
sieve.model <- sieve_preprocess(X = TrainData[,2:(xdim+1)], 
                                basisN = basisN, type = type)
sieve.fit<- sieve_solver(model = sieve.model, Y = TrainData$Y,
                         family = 'binomial')

[Package Sieve version 2.1 Index]