makessg {bigsplines}R Documentation

Makes Objects to Fit Generalized Smoothing Spline ANOVA Models

Description

This function creates a list containing the necessary information to fit a generalized smoothing spline anova model (see bigssg).

Usage

makessg(formula,family,data,type=NULL,nknots=NULL,rparm=NA,
        lambdas=NULL,skip.iter=TRUE,se.lp=FALSE,rseed=1234,
        gcvopts=NULL,knotcheck=TRUE,gammas=NULL,weights=NULL,
        gcvtype=c("acv","gacv","gacv.old"))

Arguments

formula

An object of class "formula": a symbolic description of the model to be fitted (see Details and Examples for more information).

family

Distribution for response. One of five options: "binomial", "poisson", "Gamma", "inverse.gaussian", or "negbin". See bigssg.

data

Optional data frame, list, or environment containing the variables in formula.

type

List of smoothing spline types for predictors in formula (see Details). Options include type="cub" for cubic, type="acub" for another cubic, type="per" for cubic periodic, type="tps" for cubic thin-plate, and type="nom" for nominal.

nknots

Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of data to use as knots.

rparm

List of rounding parameters for each predictor. See Details.

lambdas

Vector of global smoothing parameters to try. Default uses lambdas=10^-c(9:0)

skip.iter

Logical indicating whether to skip the iterative smoothing parameter update. Using skip.iter=FALSE should provide a more optimal solution, but the fitting time may be substantially longer. See Computational Details.

se.lp

Logical indicating if the standard errors of the linear predictors (\eta) should be estimated.

rseed

Random seed for knot sampling. Input is ignored if nknots is an input vector of knot indices. Set rseed=NULL to obtain a different knot sample each time, or set rseed to any positive integer to use a different seed than the default.

gcvopts

Control parameters for optimization. List with 6 elements: (i) maxit: maximum number of outer iterations, (ii) gcvtol: covergence tolerance for iterative GACV update, (iii) alpha: tuning parameter for GACV minimization, (iv) inmaxit: maximum number of inner iterations for iteratively reweighted fitting, (v) intol: inner convergence tolerance for iteratively reweighted fitting, and (vi) insub: number of data points to subsample when checking inner convergence. gcvopts=list(maxit=5,gcvtol=10^-5,alpha=1,inmaxit=100,intol=10^-5,insub=10^4)

knotcheck

If TRUE, only unique knots are used (for stability).

gammas

List of initial smoothing parameters for each predictor. See Details.

weights

Vector of positive weights for fitting (default is vector of ones).

gcvtype

Cross-validation criterion for selecting smoothing parameters (see Details).

Details

See bigssg and below example for more details.

Value

An object of class "makessg", which can be input to bigssg.

Warning

When inputting a "makessg" class object into bigssg, the formula input to bigssg must be a nested version of the original formula input to makessg. In other words, you cannot add any new effects after a "makessg" object has been created, but you can drop (remove) effects from the model.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.

Gu, C. and Xiang, D. (2001). Cross-validating non-Gaussian data: Generalized approximate cross-validation revisited. Journal of Computational and Graphical Statistics, 10, 581-591.

Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.

Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.

Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.

Examples


##########   EXAMPLE  ##########

# function with two continuous predictors
set.seed(1)
myfun <- function(x1v,x2v){
  sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v))
}
ndpts <- 1000
x1v <- runif(ndpts)
x2v <- runif(ndpts)

# binomial response (no weights)
set.seed(773)
lp <- myfun(x1v,x2v)
p <- 1/(1+exp(-lp))
y <- rbinom(n=ndpts,size=1,p=p)

# fit 2 possible models (create information 2 separate times)
system.time({
  intmod <- bigssg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50)
  addmod <- bigssg(y~x1v+x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50)
})

# fit 2 possible models (create information 1 time)
system.time({
  makemod <- makessg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50)
  int2mod <- bigssg(y~x1v*x2v,data=makemod)
  add2mod <- bigssg(y~x1v+x2v,data=makemod)
})

# check difference (no difference)
crossprod( intmod$fitted.values - int2mod$fitted.values )
crossprod( addmod$fitted.values - add2mod$fitted.values )


[Package bigsplines version 1.1-1 Index]