R: Makes Objects to Fit Generalized Smoothing Spline ANOVA...

makessg {bigsplines}

R Documentation

Makes Objects to Fit Generalized Smoothing Spline ANOVA Models

Description

This function creates a list containing the necessary information to fit a generalized smoothing spline anova model (see bigssg).

Usage

makessg(formula,family,data,type=NULL,nknots=NULL,rparm=NA,
        lambdas=NULL,skip.iter=TRUE,se.lp=FALSE,rseed=1234,
        gcvopts=NULL,knotcheck=TRUE,gammas=NULL,weights=NULL,
        gcvtype=c("acv","gacv","gacv.old"))

Arguments

`formula`	An object of class "`formula`": a symbolic description of the model to be fitted (see Details and Examples for more information).
`family`	Distribution for response. One of five options: `"binomial"`, `"poisson"`, `"Gamma"`, `"inverse.gaussian"`, or `"negbin"`. See `bigssg`.
`data`	Optional data frame, list, or environment containing the variables in `formula`.
`type`	List of smoothing spline types for predictors in `formula` (see Details). Options include `type="cub"` for cubic, `type="acub"` for another cubic, `type="per"` for cubic periodic, `type="tps"` for cubic thin-plate, and `type="nom"` for nominal.
`nknots`	Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of `data` to use as knots.
`rparm`	List of rounding parameters for each predictor. See Details.
`lambdas`	Vector of global smoothing parameters to try. Default uses `lambdas=10^-c(9:0)`
`skip.iter`	Logical indicating whether to skip the iterative smoothing parameter update. Using `skip.iter=FALSE` should provide a more optimal solution, but the fitting time may be substantially longer. See Computational Details.
`se.lp`	Logical indicating if the standard errors of the linear predictors (`\eta`) should be estimated.
`rseed`	Random seed for knot sampling. Input is ignored if `nknots` is an input vector of knot indices. Set `rseed=NULL` to obtain a different knot sample each time, or set `rseed` to any positive integer to use a different seed than the default.
`gcvopts`	Control parameters for optimization. List with 6 elements: (i) `maxit`: maximum number of outer iterations, (ii) `gcvtol`: covergence tolerance for iterative GACV update, (iii) `alpha`: tuning parameter for GACV minimization, (iv) `inmaxit`: maximum number of inner iterations for iteratively reweighted fitting, (v) `intol`: inner convergence tolerance for iteratively reweighted fitting, and (vi) `insub`: number of data points to subsample when checking inner convergence. `gcvopts=list(maxit=5,gcvtol=10^-5,alpha=1,inmaxit=100,intol=10^-5,insub=10^4)`
`knotcheck`	If `TRUE`, only unique knots are used (for stability).
`gammas`	List of initial smoothing parameters for each predictor. See Details.
`weights`	Vector of positive weights for fitting (default is vector of ones).
`gcvtype`	Cross-validation criterion for selecting smoothing parameters (see Details).

Details

See bigssg and below example for more details.

Value

An object of class "makessg", which can be input to bigssg.

Warning

When inputting a "makessg" class object into bigssg, the formula input to bigssg must be a nested version of the original formula input to makessg. In other words, you cannot add any new effects after a "makessg" object has been created, but you can drop (remove) effects from the model.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.

Gu, C. and Xiang, D. (2001). Cross-validating non-Gaussian data: Generalized approximate cross-validation revisited. Journal of Computational and Graphical Statistics, 10, 581-591.

Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.

Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.

Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.

Examples


##########   EXAMPLE  ##########

# function with two continuous predictors
set.seed(1)
myfun <- function(x1v,x2v){
  sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v))
}
ndpts <- 1000
x1v <- runif(ndpts)
x2v <- runif(ndpts)

# binomial response (no weights)
set.seed(773)
lp <- myfun(x1v,x2v)
p <- 1/(1+exp(-lp))
y <- rbinom(n=ndpts,size=1,p=p)

# fit 2 possible models (create information 2 separate times)
system.time({
  intmod <- bigssg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50)
  addmod <- bigssg(y~x1v+x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50)
})

# fit 2 possible models (create information 1 time)
system.time({
  makemod <- makessg(y~x1v*x2v,family="binomial",type=list(x1v="cub",x2v="cub"),nknots=50)
  int2mod <- bigssg(y~x1v*x2v,data=makemod)
  add2mod <- bigssg(y~x1v+x2v,data=makemod)
})

# check difference (no difference)
crossprod( intmod$fitted.values - int2mod$fitted.values )
crossprod( addmod$fitted.values - add2mod$fitted.values )

[Package bigsplines version 1.1-1 Index]