GeoDistMOS {PracTools} | R Documentation |
Split geographic PSUs based on a measure of size threshold
Description
Split geographic PSUs into new geographically contiguous PSUs based on a maximum measure of size for each PSU
Usage
GeoDistMOS(lat, long, psuID, n, MOS.var, MOS.takeall = 1, Input.ID = NULL)
Arguments
lat |
latitude variable in an input file. Must be in decimal format. |
long |
longitude variable in an input file. Must be in decimal format. |
psuID |
PSU Cluster ID from an input file. |
n |
Sample size of PSUs; may be a preliminary value used in the computation to identify certainty PSUs |
MOS.var |
Variable used for probability proportional to size sampling |
MOS.takeall |
Threshold relative measure of size value for certainties; must satisfy 0 < |
Input.ID |
ID variable from the input file |
Details
GeoDistMOS
splits geographic primary sampling units (PSUs) in the input object based on a variable which is used to create the measure of size for each PSU (MOS.var
). The goal is to create PSUs of similarly sized MOS. The input file should have one row for each geographic unit, i.e. secondary sampling unit (SSU), with a PSU ID assigned. The latitude and longitude input vectors define the centroid of each input SSU. The complete linkage method for clustering is used. Accordingly, PSUs are split on a distance metric and not on the MOS threshold value. GeoDistMOS
calls the function inclusionprobabilities
from the sampling
package to calculate the inclusion probability for each SSU within a PSU and distHaversine
from the geosphere
package to calculate the distances between centroids.
Value
A list with two components:
PSU.ID.Max.MOS |
A data frame containing the SSU ID value in character format ( |
PSU.Max.MOS.Info |
A data frame containing the new PSU ID ( |
Author(s)
George Zipf, Richard Valliant
See Also
Examples
data(Test_Data_US)
# Create PSU ID with GeoDistPSU
g <- GeoDistPSU(Test_Data_US$lat,
Test_Data_US$long,
"miles",
100,
Input.ID = Test_Data_US$ID)
# Append PSU ID to input file
library(dplyr)
Test_Data_US <- dplyr::inner_join(Test_Data_US, g$PSU.ID, by=c("ID" = "Input.file.ID"))
# Split PSUs with MOS above 0.80
m <- GeoDistMOS(lat = Test_Data_US$lat,
long = Test_Data_US$long,
psuID = Test_Data_US$psuID,
n = 15,
MOS.var = Test_Data_US$Amount,
MOS.takeall = 0.80,
Input.ID = Test_Data_US$ID)
# Create histogram of Measure of Size Values
hist(m$PSU.Max.MOS.Info$psuID.prob,
breaks = seq(0, 1, 0.1),
main = "Histogram of PSU Inclusion Probabilities (Certainties = 1)",
xlab = "Inclusion Probability",
ylab = "Frequency")