R: Additional learning of local processes and prediction for...

addlearn_local {spmoran}

R Documentation

Additional learning of local processes and prediction for large samples

Description

This function performs an additional learning of local variations in spatially varying coefficients. While the SVC model implemented in resf_vc or besf_vc can be less accurate for large samples (e.g., n > 5,000) due to a degeneracy/over-smoothing problem, this additional learning mitigates this problem by synthesizing/averaging the model with local SVC models. The resulting spatial prediction implemented in this function is expected to be more accurate than the resf_vc function.

Usage

addlearn_local( mod, meig0 = NULL, x0 = NULL, xconst0=NULL, xgroup0=NULL,
           cl_num=NULL, cl=NULL, parallel=FALSE, ncores=NULL )

Arguments

`mod`	Outpot from `resf_vc` or `besf_vc` function
`meig0`	Moran eigenvectors at prediction sites. Output from `meigen0`
`x0`	Matrix of explanatory variables at prediction sites whose coefficients are allowed to vary across geographical space (N_0 x K). Default is NULL
`xconst0`	Matrix of explanatory variables at prediction sites whose coefficients are assumed constant (or NVC) across space (N_0 x K_const). Default is NULL
`xgroup0`	Matrix of group indeces at prediction sites that may be group IDs (integers) or group names (N_0 x K_g). Default is NULL
`cl_num`	Number of local sub-models being aggregated/averaged. If NULL, the number is determined so that the number of samples per sub-model equals approximately 600. Default is NULL
`cl`	Vector of cluster ID for each sample (N x 1). If specified, the local sub-models are given by this ID. If NULL, k-means clustering based on spatial coordinates is performed to obtain spatial clusters each of which contain approximately 600 samples. Default is NULL
`parallel`	If TRUE, the model is estimated through parallel computation. The default is FALSE for `resf_vc` while TRUE for `besf_vc`
`ncores`	Number of cores used for the parallel computation. If ncores = NULL and parallel = TRUE, the number of available cores is detected. Default is NULL

Value

`b_vc`	Matrix of estimated spatially varying coefficients (SVCs) on x (N x K)
`bse_vc`	Matrix of standard errors for the SVCs on x (N x k)
`z_vc`	Matrix of z-values for the SVCs on x (N x K)
`p_vc`	Matrix of p-values for the SVCs on x (N x K)
`c`	Matrix with columns for the estimated coefficients on xconst, their standard errors, z-values, and p-values (K_c x 4)
`b_g`	List of K_g matrices with columns for the estimated group effects, their standard deviations, and t-values
`s`	List of 2 elements summarizing variance parameters characterizing SVCs of each local sub-model. The first element contains standard deviations of each SVCs while the second elementcontains their Moran's I values that are scaled to take a value between 0 (no spatial dependence) and 1 (strongest positive spatial dependence). Based on Griffith (2003), the scaled Moran'I value is interpretable as follows: 0.25-0.50:weak; 0.50-0.70:moderate; 0.70-0.90:strong; 0.90-1.00:marked
`s_global`	The same variance parameters for the globa sub-model
`s_g`	Vector of standard deviations of the group effects
`e`	Error statistics. It includes residual standard error (resid_SE), adjusted conditional R2 (adjR2(cond)), restricted log-likelihood (rlogLik), Akaike information criterion (AIC), and Bayesian information criterion (BIC)
`pred`	Matrix of predicted values for y (pred) and their standard errors (pred_se) (N x 2)
`resid`	Vector of residuals (N x 1)
`cl`	Vector of cluster ID being used (N x 1)
`pred0`	Matrix of predicted values for y (pred) and their standard errors (pred_se) at prediction sites (N_0 x 2)
`b_vc0`	Matrix of estimated spatially varying coefficients (SVCs) at prediction sites (N_0 x K)
`bse_vc0`	Matrix of standard errors for the SVCs at prediction sites (N_0 x k)
`z_vc0`	Matrix of z-values for the SVCs at prediction sites (N x K)
`p_vc0`	Matrix of p-values for the SVCs at prediction sites (N x K)
`other`	List of other outputs, which are internally used

Author(s)

Daisuke Murakami

References

Murakami, D., Sugasawa, S., T., Seya, H., and Griffith, D.A. (2024) Sub-model aggregation-based scalable eigenvector spatial filtering: application to spatially varying coefficient modeling. Arxiv.

Examples

require(spdep)
data(house)
dat0    <- data.frame(house@coords,house@data)
dat     <- dat0[dat0$yrbuilt>=1980,]

###### purpose 1: improve SVC modeling accuracy ######
###### (i.e., addressing the over-smoothing problem) #
y	      <- log(dat[,"price"])
x       <- dat[,c("age","rooms")]
xconst  <- dat[,c("lotsize","s1994","s1995","s1996","s1997","s1998")]
coords  <- dat[ ,c("long","lat")]
meig    <- meigen_f( coords )

## Not run: Remove # and run
# res0  <- resf_vc(y = y,x = x, xconst = xconst, meig = meig)
# res   <- addlearn_local(res0) # It adjusts SVCs to model local patterns
# res

####### parallel version for very large samples (e.g., n >100,000)
# bes0  <- besf_vc(y = y,x = x, xconst = xconst, coords=coords)
# bes	  <- addlearn_local( bes0 )


####### purpose 2: improve predictive accuracy ########

#samp    <- sample( dim( dat )[ 1 ], 2500)
#d       <- dat[ samp, ]    ## Data at observed sites
#y	     <- log(d[,"price"])
#x       <- d[,c("age","rooms")]
#xconst  <- d[,c("lotsize","s1994","s1995","s1996","s1997","s1998")]
#coords  <- d[ ,c("long","lat")]

#d0      <- dat[-samp, ]    ## Data at observed sites
#y0	     <- log(d0[,"price"])
#x0      <- d0[,c("age","rooms")]
#xconst0 <- d0[,c("lotsize","s1994","s1995","s1996","s1997","s1998")]
#coords0 <- d0[ ,c("long","lat")]

#meig    <- meigen_f( coords )
#res0    <- resf_vc(y = y,x = x, xconst = xconst, meig = meig)
#meig0   <- meigen0( meig=meig, coords0=coords0 )
#res     <- addlearn_local(res0, meig0=meig0, x0=x0, xconst0=xconst0) #
#pred    <- res$pred0       ## Predictive values

[Package spmoran version 0.2.3 Index]