sop {SOP} | R Documentation |
Estimation of generalised additive P-spline regression models with overlapping penalties.
Description
The function sop()
fits generalised additive regression models. For the smooth terms, it uses P-splines (Eilers and Marx, 1996) and it can cope with one, two and three dimensional smooth terms. The innovation of the function is that smoothing/variance parameters are estimated on the basis of the SOP method; see Rodriguez-Alvarez et al. (2015) and Rodriguez-Alvarez et al. (2019) for details. This speeds up the fit.
Usage
sop(formula, data = list(), family = gaussian(), weights = NULL, offset = NULL,
control = sop.control(), fit = TRUE)
Arguments
formula |
a sop formula. This is exactly like the formula for a GLM except that (1) P-splines in one, two and three dimensions ( |
data |
a data frame containing the model response variable and covariates required by the formula. |
family |
object of class |
weights |
prior weights on the contribution of the data to the log likelihood. Note that a weight of 2, for example, is equivalent to having made exactly the same observation twice. If you want to reweight the contributions of each datum without changing the overall magnitude of the log likelihood, then you should normalize the weights e.g. |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of observations. |
control |
a list of control values to replace the default values returned by the function |
fit |
logical. If TRUE, the model is fitted. |
Details
The sop()
can be used to fit generalised additive models. It works similarly to the function gam()
of the package mgcv. The function sop()
uses P-splines (Eilers and Marx, 1996), one of the option on gam()
. Estimation is based on the equivalence between P-splines and linear mixed models, and variance/smoothing parameters are estimated based on restricted maximum likelihood (REML) using the separation of overlapping precision matrices (SOP) method described in Rodriguez-Alvarez et al. (2015) and Rodriguez-Alvarez et al. (2019).
The function sop()
can be seen as a faster alternative to gam()
for some data sets.
Value
An object of class ‘sop’. It is a list containing the following objects:
b.fixed |
the estimated fixed effect coefficients (present if |
b.random |
the predicted random effect coefficients (present if |
fitted.values |
the fitted values (present if |
linear.predictor |
the values of the linear predictor (present if |
residuals |
the (deviance) residuals (present if |
X |
the fixed effect design matrix. |
Z |
the random effect design matrix. |
G |
a list containing information about the precision/penalty matrices (one for each smoothing/variance parameter in the model). |
y |
the response |
weights |
the prior weights. |
family |
the distribution family. |
out |
a list with i) |
deviance |
the deviance (present if |
null.deviance |
the null deviance (present if |
Vp |
Bayesian posterior covariance matrix for the coefficients (present if |
call |
the function call. |
data |
the data. |
formula |
the model formula. |
lin |
a list containing information about the parametric/linear. |
random |
a list containing information about the random effects. |
f |
a list containing information about the smoothers. |
na.action |
vector with the observations (position) deleted due to missingness. |
names.terms |
the terms used in the formula. |
model.terms |
the explanatory variables. |
nterms |
the number of linear, random and smooth terms in the formula. |
References
Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11 (2), 89–121.
Rodriguez-Alvarez, M.X., Lee, D. J., Kneib, T., Durban, M., and Eilers, P. (2015). Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. Statistics and Computing, 25 (5), 941–957.
Rodriguez-Alvarez, M.X., Durban, M., Lee, D. J. and Eilers, P. (2019). On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing. Statistics and Computing, 29 (3), 483–500.
Examples
library(SOP)
## An example of use of SOP package with tensor product B-splines in 2D
# Simulate the data
set.seed(123)
n <- 1000
sigma <- 0.1
x1 <- runif(n, -1, 1)
x2 <- runif(n, -1, 1)
f0 <- function(x1, x2) cos(2*pi*sqrt((x1 - 0.5)^2 + (x2 - 0.5)^2))
f <- f0(x1, x2)
y <- f + rnorm(n, 0, sigma)
dat <- data.frame(x1 = x1, x2 = x2, y = y)
# Theoretical surface
np <- 50
x1p <- seq(-1, 1, length = np)
x2p <- seq(-1, 1, length = np)
fp <- cos(2 * pi * sqrt(outer((x1p - 0.5) ^ 2, (x2p - 0.5) ^ 2, '+')))
image(x1p, x2p, matrix(fp, np, np), main = 'f(x1,x2) - Theor',
col = topo.colors(100))
# Fit the model
m0 <- sop(formula = y ~ f(x1, x2, nseg = 10), data = dat,
control = list(trace = FALSE))
summary(m0)
plot(m0, col = topo.colors(100))
## An example of use of SOP package with several smooth terms and Gamma distribution
# Simulate the data
set.seed(123)
n <- 1000
alpha <- 0.75
x0 <- runif(n)
x1 <- x0*alpha + (1-alpha)*runif(n)
x2 <- runif(n)
x3 <- x2*alpha + (1-alpha)*runif(n)
x4 <- runif(n)
x5 <- runif(n)
f0 <- function(x)2*sin(pi*x)
f1 <- function(x)exp(2*x)
f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
f <- f0(x0) + f1(x1) + f2(x2)
y <- rgamma(f,exp(f/4),scale=1.2)
df <- data.frame(y = y, x0 = x0, x1 = x1, x2 = x2, x3 = x3, x4 = x4, x5 = x5)
# Fit the model
m1 <- sop(formula = y ~ f(x0, nseg = 17) +
f(x1, nseg = 17) +
f(x2, nseg = 17) +
f(x3, nseg = 17) +
f(x4, nseg = 17) +
f(x5, nseg = 17), family = Gamma(link = log), data = df)
summary(m1)
plot(m1)
## An example of use of SOP package for the analysis of field trials experiments.
## Taken from the SpATS package.
require(SpATS)
data(wheatdata)
# Create factor variable for row and columns
wheatdata$R <- as.factor(wheatdata$row)
wheatdata$C <- as.factor(wheatdata$col)
# package SOP
m2 <- sop(formula = yield ~ colcode + rowcode +
f(col, row, nseg = c(10, 10)) +
rae(geno) + rae(R) + rae(C), data = wheatdata)
summary(m2)
plot(m2, col = topo.colors(100), pages = 1)
# Package SpATS: more adequate for this analysis.
# SpATS has been explicitly developed for the analysis field trials experiments.
m3 <- SpATS(response = "yield",
spatial = ~ SAP(col, row, nseg = c(10,10), degree = 3, pord = 2, center = TRUE),
genotype = "geno",
genotype.as.random = TRUE,
fixed = ~ colcode + rowcode, random = ~ R + C, data = wheatdata,
control = list(tolerance = 1e-06))
summary(m3)
plot(m3)