gssanova0 {gss} | R Documentation |
Fitting Smoothing Spline ANOVA Models with Non-Gaussian Responses
Description
Fit smoothing spline ANOVA models in non-Gaussian regression. The
symbolic model specification via formula
follows the same
rules as in lm
and glm
.
Usage
gssanova0(formula, family, type=NULL, data=list(), weights, subset,
offset, na.action=na.omit, partial=NULL, method=NULL,
varht=1, nu=NULL, prec=1e-7, maxiter=30)
gssanova1(formula, family, type=NULL, data=list(), weights, subset,
offset, na.action=na.omit, partial=NULL, method=NULL,
varht=1, alpha=1.4, nu=NULL, id.basis=NULL, nbasis=NULL,
seed=NULL, random=NULL, skip.iter=FALSE)
Arguments
formula |
Symbolic description of the model to be fit. |
family |
Description of the error distribution. Supported
are exponential families |
type |
List specifying the type of spline for each variable.
See |
data |
Optional data frame containing the variables in the model. |
weights |
Optional vector of weights to be used in the fitting process. |
subset |
Optional vector specifying a subset of observations to be used in the fitting process. |
offset |
Optional offset term with known parameter 1. |
na.action |
Function which indicates what should happen when the data contain NAs. |
partial |
Optional symbolic description of parametric terms in partial spline models. |
method |
Score used to drive the performance-oriented
iteration. Supported are |
varht |
Dispersion parameter needed for |
nu |
Inverse scale parameter in accelerated life model families. Ignored for exponential families. |
prec |
Precision requirement for the iterations. |
maxiter |
Maximum number of iterations allowed for performance-oriented iteration, and for inner-loop multiple smoothing parameter selection when applicable. |
alpha |
Tuning parameter modifying GCV or Mallows' CL. |
id.basis |
Index designating selected "knots". |
nbasis |
Number of "knots" to be selected. Ignored when
|
seed |
Seed for reproducible random selection of "knots".
Ignored when |
random |
Input for parametric random effects in nonparametric
mixed-effect models. See |
skip.iter |
Flag indicating whether to use initial values of
theta and skip theta iteration. See |
Details
The model specification via formula
is intuitive. For
example, y~x1*x2
yields a model of the form
y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e
with the terms denoted by "1"
, "x1"
, "x2"
, and
"x1:x2"
.
The model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.
Only one link is implemented for each family
. It is the
logit link for "binomial"
, and the log link for
"poisson"
, "Gamma"
, and "inverse.gaussian"
.
For "nbinomial"
, the working parameter is the logit of the
probability p
; see NegBinomial
. For
"weibull"
, "lognorm"
, and "loglogis"
, it is the
location parameter for the log lifetime.
The models are fitted by penalized likelihood method through the
performance-oriented iteration as described in the reference. For
family="binomial"
, "poisson"
, "nbinomial"
,
"weibull"
, "lognorm"
, and "loglogis"
, the score
driving the performance-oriented iteration defaults to
method="u"
with varht=1
. For family="Gamma"
and "inverse.gaussian"
, the default is method="v"
.
gssanova0
uses the algorithm of ssanova0
for
the iterated penalized least squares problems, whereas
gssanova1
uses the algorithm of ssanova
.
In gssanova1
, a subset of the observations are selected as
"knots." Unless specified via id.basis
or nbasis
, the
number of "knots" q
is determined by max(30,10n^{2/9})
,
which is appropriate for the default cubic splines for numerical
vectors.
Value
gssanova0
returns a list object of class
c("gssanova0","ssanova0","gssanova")
.
gssanova1
returns a list object of class
c("gssanova","ssanova")
.
The method summary.gssanova0
or
summary.gssanova
can be used to obtain summaries of
the fits. The method predict.ssanova0
or
predict.ssanova
can be used to evaluate the fits at
arbitrary points along with standard errors, on the link scale. The
methods residuals.gssanova
and
fitted.gssanova
extract the respective traits from the
fits.
Responses
For family="binomial"
, the response can be specified either
as two columns of counts or as a column of sample proportions plus a
column of total counts entered through the argument weights
,
as in glm
.
For family="nbinomial"
, the response may be specified as two
columns with the second being the known sizes, or simply as a single
column with the common unknown size to be estimated through the
maximum likelihood.
For family="weibull"
, "lognorm"
, or "loglogis"
,
the response consists of three columns, with the first giving the
follow-up time, the second the censoring status, and the third the
left-truncation time. For data with no truncation, the third column
can be omitted.
For family="polr"
, the response should be an ordered factor.
Note
The direct cross-validation of gssanova
can be more
effective, and more stable for complex models.
For large sample sizes, the approximate solutions of
gssanova1
and gssanova
can be faster than
gssanova0
.
The results from gssanova1
may vary from run to run. For
consistency, specify id.basis
or set seed
.
The method project
is not implemented for
gssanova0
, nor is the mixed-effect model support through
mkran
.
In gss versions earlier than 1.0, gssanova0
was under
the name gssanova
.
References
Gu, C. (1992), Cross-validating non Gaussian data. Journal of Computational and Graphical Statistics, 1, 169-179.
Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.
GU, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.
Examples
## Fit a cubic smoothing spline logistic regression model
test <- function(x)
{.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2}
x <- (0:100)/100
p <- 1-1/(1+exp(test(x)))
y <- rbinom(x,3,p)
logit.fit <- gssanova0(cbind(y,3-y)~x,family="binomial")
## The same fit
logit.fit1 <- gssanova0(y/3~x,"binomial",weights=rep(3,101))
## Obtain estimates and standard errors on a grid
est <- predict(logit.fit,data.frame(x=x),se=TRUE)
## Plot the fit and the Bayesian confidence intervals
plot(x,y/3,ylab="p")
lines(x,p,col=1)
lines(x,1-1/(1+exp(est$fit)),col=2)
lines(x,1-1/(1+exp(est$fit+1.96*est$se)),col=3)
lines(x,1-1/(1+exp(est$fit-1.96*est$se)),col=3)
## Clean up
## Not run: rm(test,x,p,y,logit.fit,logit.fit1,est)
dev.off()
## End(Not run)