createAQuadtree {AQuadtree} | R Documentation |
Create a Quadtree grid to anonymise spatial point data
Description
createAQuadtree
returns a SpatialPointsDataFrame representing a Quadtree
hierarchical geographic dataset. The resulting grid contains varying size cells
depending on a given threshold and column.
with identifiers
A cellCode
and cellNum
is created for each cell as in INSPIRE
Specification on Geographical Grid Systems.
Usage
createAQuadtree(
points,
dim = 1000,
layers = 5,
colnames = NULL,
threshold = 100,
thresholdField = NULL,
funs = NULL,
as = "Spatial",
ineq.threshold = 0.25,
loss.threshold = 0.4
)
Arguments
points |
object of class "SpatialPoints" or "SpatialPointsDataFrame". |
dim |
a single integer specifying the initial cell sizes in meters, defaults to 1000. |
layers |
a single integer specifying the number of divisions of the initial cells, defaults to 5. |
colnames |
character or character vector specifying the columns to summarise in the resulting quadtree. For columns of class factor, a column for each factor level cill be created. |
threshold |
number. The threshold minimum value each cell must have
in the column |
thresholdField |
character or character vector specifying the
columns to which the |
funs |
character or character vector specifying the summary
functions for each of the |
as |
character indicating return type, if "AQuadtree" a quadtree class element will be returned, otherwise a SpatialPolygonsDataFrame will ber returned. Defaults to "Spatial". |
ineq.threshold |
inequality threshold value to be considered on the disaggregation process. Forces disaggregation under the given inequality threshold. |
loss.threshold |
loss threshold value to be considered on the disaggregation process. Stops disaggregation when there's much loss (i.e loss rate > ineq.threshold ). |
Details
Given a set of points a varying size Quadtree grid is created performing a
bottom-up aggregation considering a minimum threshold for each cell.
Cells with a value under the threshold for the thresholdField
are
aggregated to the upper level in a quadtree manner.
When no thresholdField
is given, total number of points in the cell
will be used, and so, given a threshold of k, none of the cells in the
resulting grid have a value less than k individuals as in a k-anonymity model.
The Quadtree produced balances information loss and accuracy. For instance,
for the set of cells in the left image, where numbers in the cells represent
the values in the thresholdField
, using a threshold
value of 100,
the resulting Quadtree will be the one on the right. As we can see, some cells
will be discarded, and some aggregated to maintain as much information as
possible, keeping at the same time as much disaggregation as possible
The INSPIRE coding system for cell identifiers will be used to generate a
cellCode and cellNum for each cell in the Quadtree.
The objective of the coding system is to generate unique
identifiers for each cell, for any of the resolutions.
The cellCode is a text string, composed of cell size and cell coordinates.
Cell codes start with a cell size prefix. The cell size is denoted in meter (m)
for cell sizes below 1000 m and kilometre (km) for cell sizes from 1000 m and
above.
Examples: a 100 meter cell has an identifier starting with “100m”, the
identifier of a 10000 meter cell starts with “10km”.
The coordinate part of the cell code reflects the distance of the lower left
grid cell corner from the false origin of the CRS. In order to reduce the
length of the string, Easting (E) and Northing (N) values are divided by
10^n (n is the number of zeros in the cell size value). Example for a cell
size of 10000 meters: The number of zeros in the cell size value is 4.
The resulting divider for Easting and Northing values is 10^4 = 10000.
The cellNum is a sequence of concatenated integers identifying all the
hierarchical partitions of the main cell in which the point resides.
For instance, the cellNum of the top right cell would be 416 (fourth
in first partition, sixteenth in second partition)
The input object must be projected and units should be in 'meters'
because the system uses the INSPIRE coding system.
Value
SpatialPolygonsDataFrame representing a varying size Quadtree aggregation for the given points.
See Also
-
D2.8.I.2 INSPIRE Specification on Geographical Grid Systems – Guidelines https://inspire.ec.europa.eu/documents/Data_Specifications/INSPIRE_Specification_GGS_v3.0.1.pdf
-
EEA reference grid dataset https://data.europa.eu/euodp/data/dataset/data_eea-reference-grids-2
Examples
data("CharlestonPop")
aQuadtree.Charleston<-createAQuadtree(CharlestonPop, threshold=10,
colnames="sex", thresholdField=c("sex.male", "sex.female"))