sop {SOP}R Documentation

Estimation of generalised additive P-spline regression models with overlapping penalties.

Description

The function sop() fits generalised additive regression models. For the smooth terms, it uses P-splines (Eilers and Marx, 1996) and it can cope with one, two and three dimensional smooth terms. The innovation of the function is that smoothing/variance parameters are estimated on the basis of the SOP method; see Rodriguez-Alvarez et al. (2015) and Rodriguez-Alvarez et al. (2019) for details. This speeds up the fit.

Usage

sop(formula, data = list(),  family = gaussian(), weights = NULL, offset = NULL, 
    control = sop.control(), fit = TRUE)

Arguments

formula

a sop formula. This is exactly like the formula for a GLM except that (1) P-splines in one, two and three dimensions (f), (2)spatially adaptive P-splines in 1 dimension (ad); and (3) random effects (rae) can be added to the right hand side of the formula.

data

a data frame containing the model response variable and covariates required by the formula.

family

object of class family specifying the distribution and link function.

weights

prior weights on the contribution of the data to the log likelihood. Note that a weight of 2, for example, is equivalent to having made exactly the same observation twice. If you want to reweight the contributions of each datum without changing the overall magnitude of the log likelihood, then you should normalize the weights e.g. weights <- weights/mean(weights)). If NULL (default), the weights are considered to be one.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of observations.

control

a list of control values to replace the default values returned by the function sop.control.

fit

logical. If TRUE, the model is fitted.

Details

The sop() can be used to fit generalised additive models. It works similarly to the function gam() of the package mgcv. The function sop() uses P-splines (Eilers and Marx, 1996), one of the option on gam(). Estimation is based on the equivalence between P-splines and linear mixed models, and variance/smoothing parameters are estimated based on restricted maximum likelihood (REML) using the separation of overlapping precision matrices (SOP) method described in Rodriguez-Alvarez et al. (2015) and Rodriguez-Alvarez et al. (2019). The function sop() can be seen as a faster alternative to gam() for some data sets.

Value

An object of class ‘sop’. It is a list containing the following objects:

b.fixed

the estimated fixed effect coefficients (present if fit = TRUE).

b.random

the predicted random effect coefficients (present if fit = TRUE).

fitted.values

the fitted values (present if fit = TRUE).

linear.predictor

the values of the linear predictor (present if fit = TRUE).

residuals

the (deviance) residuals (present if fit = TRUE).

X

the fixed effect design matrix.

Z

the random effect design matrix.

G

a list containing information about the precision/penalty matrices (one for each smoothing/variance parameter in the model).

y

the response

weights

the prior weights.

family

the distribution family.

out

a list with i) tol.ol tolerance parameter (outer loop); ii) it.ol number of iteration (outer loop); iii) tol.il tolerance parameter (inner loop); it.il (number of iteration (inner loop)), iv) vc variance components estimates, v) edf effective degrees of freedom (present if fit = TRUE).

deviance

the deviance (present if fit = TRUE).

null.deviance

the null deviance (present if fit = TRUE).

Vp

Bayesian posterior covariance matrix for the coefficients (present if fit = TRUE).

call

the function call.

data

the data.

formula

the model formula.

lin

a list containing information about the parametric/linear.

random

a list containing information about the random effects.

f

a list containing information about the smoothers.

na.action

vector with the observations (position) deleted due to missingness.

names.terms

the terms used in the formula.

model.terms

the explanatory variables.

nterms

the number of linear, random and smooth terms in the formula.

References

Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11 (2), 89–121.

Rodriguez-Alvarez, M.X., Lee, D. J., Kneib, T., Durban, M., and Eilers, P. (2015). Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. Statistics and Computing, 25 (5), 941–957.

Rodriguez-Alvarez, M.X., Durban, M., Lee, D. J. and Eilers, P. (2019). On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing. Statistics and Computing, 29 (3), 483–500.

Examples

library(SOP)
## An example of use of SOP package with tensor product B-splines in 2D
# Simulate the data
set.seed(123)
n <- 1000
sigma <- 0.1
x1 <- runif(n, -1, 1)
x2 <- runif(n, -1, 1)
f0 <- function(x1, x2) cos(2*pi*sqrt((x1 - 0.5)^2 + (x2 - 0.5)^2))
f <- f0(x1, x2)
y <- f + rnorm(n, 0, sigma)

dat <- data.frame(x1 = x1, x2 = x2, y = y)

# Theoretical surface
np <- 50
x1p <- seq(-1, 1, length = np)  
x2p <- seq(-1, 1, length = np)
fp <- cos(2 * pi * sqrt(outer((x1p - 0.5) ^ 2, (x2p - 0.5) ^ 2, '+')))

image(x1p, x2p, matrix(fp, np, np), main = 'f(x1,x2) - Theor', 
     col = topo.colors(100))

# Fit the model
m0 <- sop(formula = y ~ f(x1, x2, nseg = 10), data = dat, 
        control = list(trace = FALSE))

summary(m0)
plot(m0, col = topo.colors(100))

## An example of use of SOP package with several smooth terms and Gamma distribution
# Simulate the data
set.seed(123)
n <- 1000
alpha <- 0.75
x0 <- runif(n)
x1 <- x0*alpha + (1-alpha)*runif(n)
x2 <- runif(n)
x3 <- x2*alpha + (1-alpha)*runif(n)
x4 <- runif(n)
x5 <- runif(n)

f0 <- function(x)2*sin(pi*x)
f1 <- function(x)exp(2*x)
f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10

f <- f0(x0) + f1(x1) + f2(x2)
y <- rgamma(f,exp(f/4),scale=1.2)

df <- data.frame(y = y, x0 = x0, x1 = x1, x2 = x2, x3 = x3, x4 = x4, x5 = x5)

# Fit the model
m1 <- sop(formula = y ~ f(x0, nseg = 17) + 
                             f(x1, nseg = 17) + 
                             f(x2, nseg = 17) + 
                             f(x3, nseg = 17) + 
                             f(x4, nseg = 17) + 
                             f(x5, nseg = 17), family = Gamma(link = log), data = df)
summary(m1)
plot(m1)

## An example of use of SOP package for the analysis of field trials experiments.
## Taken from the SpATS package.
require(SpATS)
data(wheatdata)

# Create factor variable for row and columns
wheatdata$R <- as.factor(wheatdata$row)
wheatdata$C <- as.factor(wheatdata$col)

# package SOP
m2 <- sop(formula = yield ~ colcode + rowcode + 
  f(col, row, nseg = c(10, 10)) + 
  rae(geno) + rae(R) + rae(C), data = wheatdata)
summary(m2)
plot(m2, col = topo.colors(100), pages = 1)

# Package SpATS: more adequate for this analysis.
# SpATS has been explicitly developed for the analysis field trials experiments. 
m3 <- SpATS(response = "yield", 
    spatial = ~ SAP(col, row, nseg = c(10,10), degree = 3, pord = 2, center = TRUE), 
    genotype = "geno",
    genotype.as.random = TRUE,
    fixed = ~ colcode + rowcode, random = ~ R + C, data = wheatdata, 
    control =  list(tolerance = 1e-06))
summary(m3)
plot(m3)

[Package SOP version 1.0-1 Index]