prepare_data {disaggregation} R Documentation

## Prepare data for disaggregation modelling

### Description

prepare_data function is used to extract all the data required for fitting a disaggregation model. Designed to be used in the disaggregation::fit_model function.

### Usage

prepare_data(
polygon_shapefile,
covariate_rasters,
aggregation_raster = NULL,
id_var = "area_id",
response_var = "response",
sample_size_var = NULL,
mesh.args = NULL,
na.action = FALSE,
makeMesh = TRUE,
ncores = 2
)


### Arguments

 polygon_shapefile SpatialPolygonDataFrame containing at least two columns: one with the id for the polygons (id_var) and one with the response count data (response_var); for binomial data, i.e survey data, it can also contain a sample size column (sample_size_var). covariate_rasters RasterStack of covariate rasters to be used in the model. aggregation_raster Raster to aggregate pixel level predictions to polygon level e.g. population to aggregate prevalence. If this is not supplied a uniform raster will be used. id_var Name of column in SpatialPolygonDataFrame object with the polygon id. response_var Name of column in SpatialPolygonDataFrame object with the response data. sample_size_var For survey data, name of column in SpatialPolygonDataFrame object (if it exists) with the sample size data. mesh.args list of parameters that control the mesh structure with the same names as used by INLA. na.action logical. If TRUE, NAs in response will be removed, covariate NAs will be given the median value, aggregation NAs will be set to zero. Default FALSE (NAs in response or covariate data within the polygons will give errors). makeMesh logical. If TRUE, build INLA mesh, takes some time. Default TRUE. ncores Number of cores used to perform covariate extraction.

### Details

Takes a SpatialPolygonDataFrame with the response data and a RasterStack of covariates.

Extract the values of the covariates (as well as the aggregation raster, if given) at each pixel within the polygons (parallelExtract function). This is done in parallel and n.cores argument is used to set the number of cores to use for covariate extraction. This can be the number of covariates used in the model.

The aggregation raster defines how the pixels within each polygon are aggregated. The disaggregation model performs a weighted sum of the pixel prediction, weighted by the pixel values in the aggregation raster. For disease incidence rate you use the population raster to aggregate pixel incidence rate by summing the number of cases (rate weighted by population). If no aggregation raster is provided a uniform distribution is assumed, i.e. the pixel predictions are aggregated to polygon level by summing the pixel values.

Makes a matrix that contains the start and end pixel index for each polygon. Builds an INLA mesh to use for the spatial field (getStartendindex function).

The mesh.args argument allows you to supply a list of INLA mesh parameters to control the mesh used for the spatial field (build_mesh function).

The na.action flag is automatically off. If there are any NAs in the response or covariate data within the polygons the prepare_data method will error. Ideally the NAs in the data would be dealt with beforehand, however, setting na.action = TRUE will automatically deal with NAs. It removes any polygons that have NAs as a response, sets any aggregation pixels with NA to zero and sets covariate NAs pixels to the median value for the that covariate.

### Value

A list is returned of class disag_data. The functions summary, print and plot can be used on disag_data. The list of class disag_data contains:

 polygon_shapefile  The SpatialPolygonDataFrame used as an input. covariate_rasters  The RasterStack used as an input. polygon_data  A data frame with columns of area_id, response and N (sample size: all NAs unless using binomial data). Each row represents a polygon. covariate_data  A data frame with columns of area_id, cell_id and one for each covariate in covariate_rasters. Each row represents a pixel in a polygon. aggregation_pixels  An array with the value of the aggregation raster for each pixel in the same order as the rows of covariate_data. coordsForFit  A matrix with two columns of x, y coordinates of pixels within the polygons. Used to make the spatial field. coordsForPrediction  A matrix with two columns of x, y coordinates of pixels in the whole Raster. Used to make predictions. startendindex  A matrix with two columns containing the start and end index of the pixels within each polygon. mesh  A INLA mesh to be used for the spatial field of the disaggregation model.

### Examples


polygons <- list()
for(i in 1:100) {
row <- ceiling(i/10)
col <- ifelse(i %% 10 != 0, i %% 10, 10)
xmin = 2*(col - 1); xmax = 2*col; ymin = 2*(row - 1); ymax = 2*row
polygons[[i]] <- rbind(c(xmin, ymax), c(xmax,ymax), c(xmax, ymin), c(xmin,ymin))
}

polys <- do.call(raster::spPolygons, polygons)
response_df <- data.frame(area_id = 1:100, response = runif(100, min = 0, max = 10))
spdf <- sp::SpatialPolygonsDataFrame(polys, response_df)

r <- raster::raster(ncol=20, nrow=20)
r <- raster::setExtent(r, raster::extent(spdf))
r[] <- sapply(1:raster::ncell(r), function(x) rnorm(1, ifelse(x %% 20 != 0, x %% 20, 20), 3))
r2 <- raster::raster(ncol=20, nrow=20)
r2 <- raster::setExtent(r2, raster::extent(spdf))
r2[] <- sapply(1:raster::ncell(r), function(x) rnorm(1, ceiling(x/10), 3))
cov_rasters <- raster::stack(r, r2)

test_data <- prepare_data(polygon_shapefile = spdf,
covariate_rasters = cov_rasters)



[Package disaggregation version 0.1.4 Index]