modgam {MapGAM} | R Documentation |
Fit a Generalized Additive Model (GAM) with a Two-Dimensional Smooth and Make Predictions
Description
Fits a crude or adjusted regression on a user-supplied grid for spatial analysis using a generalized additive model with a two-dimensional LOESS smooth for geolocation (or any other two-dimensional predictor). Includes optional pointwise confidence intervals and permutation tests for global and local tests of the null hypothesis that the two-dimensional predictor (e.g., geolocation) is not associated with the outcome. Most applications will pass the output of this function to plot.modgam
to map the resulting odds ratios or other effect estimates.
Usage
modgam(formula, rgrid, data, subset, offset, family = binomial(), permute = 0,
conditional = TRUE, pointwise = FALSE, m = "adjusted", sp = seq(0.05,0.95,0.05),
degree=1, keep=FALSE, type=c("spatial","all"), reference = "median",
se.fit = FALSE, alpha = 0.05, verbose = TRUE, ...)
Arguments
formula |
a formula expression, with the response on the left of a ~ oprator, and the predictors on the right. If fitting a Cox additive model, the response must be a survival object as returned by the Surv function. A built-in nonparametric smoothing term is indicated by |
rgrid |
a data frame containing the values of X and Y coordinates at which to generate predictions. If missing, the predictions will be generated for the two-dimentional predictor in |
data |
a data frame containing the variables in the model. If |
family |
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
offset |
name in data and rgrid that will be used as offset in the model. It will be ignored if offset is specified in |
permute |
the number of permutations of the data set for testing the significance of the two-dimensional predictor. |
conditional |
a logical value indicating whether to run a conditional or unconditional permutation test; this argument is used only when |
pointwise |
a logical value indicated whether to include pointwise permutation tests for every grid point. This argument is only used
when |
m |
model type for the GAM. Options are "crude" or "adjusted" (default). If "crude", only the smooth for the two-dimensional predictor is included in the model (i.e., "crude" is a synonym for "unadjusted"). If "adjusted", all covariates in the data set (columns >= 4) are included in the model. |
sp |
the span size for the LOESS smooth of the two-dimensional predictor. If |
degree |
the degree of the polynomials to be used for LOESS smooth, normally 1 or 2. It will be ignored if the degree is indicated in |
keep |
a logical value indicating whether to store and return the pointwise odds ratios, and if conditional = FALSE the span size, for every permuted data set. These values aren't necessary for mapping or cluster identification, and storing them slows the permutation tests, so the default is |
type |
use |
reference |
referent for the predicted values or effects. If |
se.fit |
if |
alpha |
the significace level used for confidence intervals when |
verbose |
a logical value indicating whether to print the GAM model statement, the percentile rank for the global deviance statistic in the permutation test, and the progress of the permutation test (report completion of every 10 permutations). The default is |
... |
further arguments to be passed to the |
Details
The model used to fit the data is a generalized additive model with a LOESS smooth for a two-dimensional predictor such as geolocation (Hastie and Tibshirani, 1990; Kelsall and Diggle, 1998; Webster et al., 2006). Although any family and link function supported by the gam
function is supported, the default binomial family with logit link yields the following model:
\ln \left(\frac{\pi_{i}}{1-\pi_{i}}\right) = S(x_{i},y_{i}) + \mathbf{Z_{i}} \boldsymbol{\beta}
where \pi_{i}
is the probability that the outcome is 1 for participant i, x_{i}
and y_{i}
are predictor coordinates for participant i (i.e., projected distance east and north, respectively, from an arbitrarily defined origin location), S(.,.)
is a 2-dimensional smoothing function (currently LOESS), \mathbf{Z_{i}}
is a row vector of covariate values for participant i, and \boldsymbol{\beta}
is a vector of unknown regression parameters (including an intercept).
When a permutation test is requested, for each permutation the paired X and Y coordinates in the data set are randomly reassigned to participants, consistent with the null hypothesis that the geolocation (or another two-dimensional predictor entered in place of X and Y) is not associated with the outcome. Note that this procedure intentionally preserves associations between other covariates and the outcome variable so the permutation test reflects the significance of geolocation. See the references for more details.
Value
modgam
returns an object of class modgam
. It can be examined by print
and plot
.
grid |
a data frame with X and Y coordinates from rgrid. |
span |
the span size used for the LOESS smooth. If |
gamobj |
|
predobj |
the |
family |
the family and link function used for the model. |
fit |
the predicted values for each point on the grid. For Gaussian family models, predicted values are for the outcome variable at each grid point. For non-Gaussian families, predicted values are for the spatial effect alone on the scale of the linear predictor, compared to a referent specified through augument |
exp.fit |
the exponential of |
global.pvalue |
the p-value based on the likelihood ratio test comparing models with and without geolocation (or other two-dimensional predictor). |
If se.fit = TRUE
then the following values are provided:
se |
an estimated standard error for each predicted value. |
conf.low |
the lower bounds for pointwise (1- |
conf.high |
the higher bounds for pointwise (1- |
If permute > 0
then the following values are also provided:
global.permt |
the p-value based on the deviance statistic comparing models with and without geolocation (or other two-dimensional predictor), using the distribution of deviance statistics from the permuted data sets to represent the null distribution. For a test of H0: geolocation is unassociated with the outcome variable (adjusting for any covariates if |
pointwise.permt |
Only provided when |
If permute
> 0 and keep = T
then the following values are also provided:
permutations |
a matrix containing null permuted values for each point on the grid, with the results for each permutation in a separate column. For the binomial family and logit link these are provided as odds ratios, otherwise they are reported as linear predictors. |
deviances.permt |
a vector containing deviances for each permutation. |
Warning
Permutation tests are computationally intensive, often requiring several hours or more.
Author(s)
Lu Bai, Scott Bartell, Robin Bliss, and Veronica Vieira
Send bug reports to sbartell@uci.edu.
References
Bai L, Gillen DL, Bartell SM, Vieira VM. doi:10.32614/RJ-2020-001Mapping smoothed spatial effect estimates from individual-level data: MapGAM. The R Journal 2020, 12:32-48.
Hastie TJ, Tibshirani RJ. Generalized Additive Models. (Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Boca Raton, Florida, 1990).
Kelsall J, Diggle P. doi:10.1111/1467-9876.00128Spatial variation in risk of disease: a nonparametric binary regression approach. J Roy Stat Soc C-App 1998, 47:559-573.
Vieira V, Webster T, Weinberg J, Aschengrau A, Ozonoff D. doi:10.1186/1476-069X-4-11Spatial analysis of lung, colorectal, and breast cancer on Cape Cod: An application of generalized additive models to case-control data. Environmental Health 2005, 4:11.
Webster T, Vieira V, Weinberg J, Aschengrau A. doi:10.1186/1476-072X-5-26Method for mapping population-based case-control studies using Generalized Additive Models. International Journal of Health Geographics 2006, 5:26.
Young RL, Weinberg J, Vieira V, Ozonoff A, Webster TF. doi:10.1186/1476-072X-9-37A power comparison of generalized additive models and the spatial scan statistic in a case-control setting. International Journal of Health Geographics 2010, 9:37.
See Also
predgrid
,
optspan
,
plot.modgam
colormap
.
Examples
# Load base map in SpatialPolygonsDataFrame format
# This map was read from ESRI shapefiles using the readShapePoly function (soon to be deprecated)
data(MAmap)
# Load data and create grid on base map
data(MAdata)
gamgrid <- predgrid(MAdata, map=MAmap)
# Fit crude logistic GAM to the MA data using span size of 50%
# and predict odds ratios for every point on gamgrid
fit1 <- modgam(data=MAdata, rgrid=gamgrid, m="crude", sp=0.5)
# Summary statistics for pointwise crude odds ratios
summary(fit1$exp.fit)
# Summary stats for pointwise crude log odds (linear predictor)
summary(fit1$fit)
# fit adjusted GAM using span size of 50%,
# including a (too small) conditional permutation test
fit2 <- modgam(data=MAdata, rgrid=gamgrid, permute=25, m="adjusted", sp=0.5)
fit2
# fit adjusted GAM by specifiying formula
fit2.f <- modgam(Case~lo(Xcoord,Ycoord)+Smoking + Mercury + Selenium,data=MAdata,
rgrid=gamgrid, sp=0.5)
fit2.f
# Detailed example with a continuous outcome variable, map projections,
# and data trimming: investigating tweet times by geolocation
# NOTE: this example requires the maps and mapproj packages
# Convert base map and beer tweet data locations to US Albers projection
# for better distance estimates than using (lat,long) as (X,Y) coords
if(require(maps) & require(mapproj)) {
USmap <- map("state",projection="albers",parameters=c(29.5,45.5),
plot=FALSE,fill=TRUE,col="transparent")
data(beertweets)
# Reuse last map projection to convert data coordinates
XY <- mapproject(beertweets$longitude,beertweets$latitude)[1:2]
jtime <- julian(beertweets$time)
# Convert tweet dates and times to time of day (24-hour)
tweettime <- as.numeric(jtime-trunc(jtime))*24
beerproj <- data.frame(tweettime, XY[1], XY[2], beertweets$beer)
# Generate grid on the US map, trimmed to range of beer data
USgrid <- predgrid(beerproj, map=USmap)
# Fit adjusted model--adjusting for beer indicator variable
fit3 <- modgam(data=beerproj, rgrid=USgrid, family=gaussian,
reference="none", m="adjusted", sp=0.05)
# Get summary statistics for predicted tweet times across grid points
summary(fit3$fit)
}
# Smoothing survival rates over geolocations with adjustement of Age and Insurance
# Including generating pointwise standard errors and confidence intervals
data(CAdata)
data(CAgrid)
data = CAdata[1:1000,] # use a subset of the California data
# manually set the categorical variables as factors
data$INS = factor(data$INS)
# no formula needed if data are arranged in a specific order (see \code{data} argument)
fit4 <- modgam(data=data, rgrid=CAgrid, family="survival", sp=0.2)
fit4
# fit the same model using the formula argument, with data columns in any order
fit4.f <- modgam(Surv(time,event)~AGE+factor(INS)+lo(X,Y), data=data,
rgrid = CAgrid, family="survival", sp=0.2)
# Smoothing for two-dimensional chemical exposure instead of geolocation
# case status in 1st column, mercury and selenium in 2nd and 3rd columns
ma2 <- MAdata[,c(1,5:6)]
expgrid <- predgrid(ma2)
fit5 <- modgam(data=ma2,rgrid=expgrid,sp=.5,m="crude")
summary(fit5$exp.fit)
# plot the results, with mercury on the X axis and selenium on the Y axis
plot(fit5, contours="response", arrow=FALSE, axes=TRUE)