R: MLE of SVC model

SVC_mle {varycoef}

R Documentation

MLE of SVC model

Description

Conducts a maximum likelihood estimation (MLE) for a Gaussian process-based spatially varying coefficient model as described in Dambon et al. (2021) doi: 10.1016/j.spasta.2020.100470.

Usage

SVC_mle(...)

## Default S3 method:
SVC_mle(y, X, locs, W = NULL, control = NULL, optim.control = list(), ...)

## S3 method for class 'formula'
SVC_mle(
  formula,
  data,
  RE_formula = NULL,
  locs,
  control = NULL,
  optim.control = list(),
  ...
)

Arguments

`...`	further arguments
`y`	(`numeric(n)`) Response vector.
`X`	(`matrix(n, p)`) Design matrix. Intercept has to be added manually.
`locs`	(`matrix(n, d)`) Locations in a `d`-dimensional space. May contain multiple observations at single location.
`W`	(`NULL` or `matrix(n, q)`) If `NULL`, the same matrix as provided in `X` is used. This fits a full SVC model, i.e., each covariate effect is modeled with a mean and an SVC. In this case we have `p = q`. If optional matrix `W` is provided, SVCs are only modeled for covariates within matrix `W`.
`control`	(`list`) Control paramaters given by `SVC_mle_control`.
`optim.control`	(`list`) Control arguments for optimization function, see Details in `optim`.
`formula`	Formula describing the fixed effects in SVC model. The response, i.e. LHS of the formula, is not allowed to have functions such as `sqrt()` or `log()`.
`data`	data frame containing the observations
`RE_formula`	Formula describing the random effects in SVC model. Only RHS is considered. If `NULL`, the same RHS of argument `formula` for fixed effects is used.

Details

The GP-based SVC model is defined with some abuse of notation as:

y(s) = X \mu + W \eta (s) + \epsilon(s)

where:

y is the response (vector of length n)
X is the data matrix for the fixed effects covariates. The dimensions are n times p. This leads to p fixed effects.
\mu is the vector containing the fixed effects
W is the data matrix for the SVCs modeled by GPs. The dimensions are n times q. This lead to q SVCs in the model.
\eta are the SVCs represented by a GP.
\epsilon is the nugget effect

The MLE is an numeric optimization that runs optim or (if parallelized) optimParallel.

You can call the function in two ways. Either, you define the model matrices yourself and provide them using the arguments X and W. As usual, the individual columns correspond to the fixed and random effects, i.e., the Gaussian processes, respectively. The second way is to call the function with formulas, like you would in lm. From the data.frame provided in argument data, the respective model matrices as described above are implicitly built. Using simple arguments formula and RE_formula with data column names, we can decide which covariate is modeled with a fixed or random effect (SVC).

Note that similar to model matrix call from above, if the RE_formula is not provided, we use the one as in argument formula. Further, note that the intercept is implicitly constructed in the model matrix if not prohibited.

Value

Object of class SVC_mle if control$extract_fun = FALSE, meaning that a MLE has been conducted. Otherwise, if control$extract_fun = TRUE, the function returns a list with two entries:

obj_fun: the objective function used in the optimization
args: the arguments to evaluate the objective function.

For further details, see description of SVC_mle_control.

Author(s)

Jakob Dambon

References

Dambon, J. A., Sigrist, F., Furrer, R. (2021) Maximum likelihood estimation of spatially varying coefficient models for large data with an application to real estate price prediction, Spatial Statistics doi: 10.1016/j.spasta.2020.100470

Examples

## ---- toy example ----
## We use the sampled, i.e., one dimensional SVCs
str(SVCdata)
# sub-sample data to have feasible run time for example
set.seed(123)
id <- sample(length(SVCdata$locs), 50)

## SVC_mle call with matrix arguments
fit <- with(SVCdata, SVC_mle(
  y[id], X[id, ], locs[id], 
  control = SVC_mle_control(profileLik = TRUE, cov.name = "mat32")))

## SVC_mle call with formula
df <- with(SVCdata, data.frame(y = y[id], X = X[id, -1]))
fit <- SVC_mle(
  y ~ X, data = df, locs = SVCdata$locs[id], 
  control = SVC_mle_control(profileLik = TRUE, cov.name = "mat32")
)
class(fit)

summary(fit)


## ---- real data example ----
require(sp)
## get data set
data("meuse", package = "sp")

# construct data matrix and response, scale locations
y <- log(meuse$cadmium)
X <- model.matrix(~1+dist+lime+elev, data = meuse)
locs <- as.matrix(meuse[, 1:2])/1000


## starting MLE
# the next call takes a couple of seconds
fit <- SVC_mle(
  y = y, X = X, locs = locs,
  # has 4 fixed effects, but only 3 random effects (SVC)
  # elev is missing in SVC
  W = X[, 1:3],
  control = SVC_mle_control(
    # inital values for 3 SVC
    # 7 = (3 * 2 covariance parameters + nugget)
    init = c(rep(c(0.4, 0.2), 3), 0.2),
    profileLik = TRUE
  )
)

## summary and residual output
summary(fit)
plot(fit)

## predict
# new locations
newlocs <- expand.grid(
  x = seq(min(locs[, 1]), max(locs[, 1]), length.out = 30),
  y = seq(min(locs[, 2]), max(locs[, 2]), length.out = 30))
# predict SVC for new locations
SVC <- predict(fit, newlocs = as.matrix(newlocs))
# visualization
sp.SVC <- SVC
coordinates(sp.SVC) <- ~loc_1+loc_2
spplot(sp.SVC, colorkey = TRUE)

[Package varycoef version 0.3.4 Index]