sieve_solver {Sieve} | R Documentation |
Calculate the coefficients for the basis functions
This is the main function that performs sieve estimation. It calculate the coefficients by solving a penalized lasso type problem.
l1 = TRUE,
family = "gaussian",
lambda = NULL,
nlambda = 100
model |
a list. Typically, it is the output of Sieve::sieve_preprocess. |
Y |
a vector. The outcome variable. The length of Y equals to the training sample size, which should also match the row number of X in model. |
l1 |
a logical variable. TRUE means calculating the coefficients by sovling a l1-penalized empirical risk minimization problem. FALSE means solving a least-square problem. Default is TRUE. |
family |
a string. 'gaussian', mean-squared-error regression problem. |
lambda |
same as the lambda of glmnet::glmnet. |
nlambda |
a number. Number of penalization hyperparameter used when solving the lasso-type problem. Default is 100. |
a list. In addition to the preprocessing information, it also has the fitted value.
Phi |
a matrix. This is the design matrix directly used by the next step model fitting. The (i,j)-th element of this matrix is the evaluation of i-th sample's feature at the j-th basis function. The dimension of this matrix is sample size x basisN. |
X |
a matrix. This is the rescaled original feature/predictor matrix. |
beta_hat |
a matrix. Dimension is basisN x nlambda. The j-th column corresponds to the fitted regression coeffcients using the j-th hyperparameter in lambda. |
type |
a string. The type of basis funtion. |
index_matrix |
a matrix. It specifies what are the product basis functions used when constructing the design matrix Phi. It has a dimension basisN x dimension of original features. There are at most interaction_order many non-1 elements in each row. |
basisN |
a number. Number of sieve basis functions. |
norm_para |
a matrix. It records how each dimension of the feature/predictor is rescaled, which is useful when rescaling the testing sample's predictors. |
lambda |
a vector. It records the penalization hyperparameter used when solving the lasso problems. Default has a length of 100, meaning the algorithm tried 100 different penalization hyperparameters. |
family |
a string. 'gaussian', continuous numerical outcome, regression probelm; 'binomial', binary outcome, classification problem. |
xdim <- 1 #1 dimensional feature
#generate 1000 training samples
TrainData <- GenSamples(s.size = 1000, xdim = xdim)
#use 50 cosine basis functions
type <- 'cosine'
basisN <- 50
sieve.model <- sieve_preprocess(X = TrainData[,2:(xdim+1)],
basisN = basisN, type = type)<- sieve_solver(model = sieve.model, Y = TrainData$Y)
###if the outcome is binary,
###need to solve a nonparametric logistic regression problem
xdim <- 1
TrainData <- GenSamples(s.size = 1e3, xdim = xdim, y.type = 'binary', frho = 'nonlinear_binary')
sieve.model <- sieve_preprocess(X = TrainData[,2:(xdim+1)],
basisN = basisN, type = type)<- sieve_solver(model = sieve.model, Y = TrainData$Y,
family = 'binomial')