R: Gross Flows estimation

estGF {GFE}

R Documentation

Gross Flows estimation

Description

Gross Flows under complex electoral surveys.

Usage

estGF(
  sampleBase = NULL,
  niter = 100,
  model = NULL,
  colWeights = NULL,
  nonrft = FALSE
)

Arguments

`sampleBase`	An object of class "data.frame" containing the information of electoral candidates. The data must contain the samplings weights.
`niter`	The number of iterations for the `\eta_{i}` and `p_{ij}` model parameters within the model.
`model`	A character indicating the model to be used in estimating estimated gross flows. The models available are: "I","II","III","IV" (see also "Details").
`colWeights`	The column name containing the sampling weights to be used in the fitting process.
`nonrft`	A logical value indicating a non response for first time.

Details

The population size N must satisfy the condition:

N = \sum_{j}\sum_{i} N_{ij} + \sum_{j} C_{j} + \sum_{i} R_{i} + M

where, N_{ij} is the amount of people interviewed who have classification i at first time and classification j at second time, R_{i} is the amount of people who did not respond at second time, but did at first time, C_{j} is the amount of people who did not respond at first time, but they did at second time and M is the number of people who did not respond at any time or could not be reached. Let \eta_{i} the initial probability that a person has classification i in the first time, and let p_{ij} the vote transition probability for the cell (i,j), where \sum_{i} \eta_{i} = 1 and \sum_{j} p_{ij} = 1. Thus, four possibles models for the gross flows are given by:

Model I: This model assumes that a person's initial probability of being classified as i at first time is the same for everyone, that is, \psi(i,j) = \psi. Besides, transition probabilities between respond and non response not depend of the classification (i,j), that is \rho_{MM}(i,j) = \rho_{MM} and \rho_{RR}(i,j) = \rho_{RR}.
Model II: Unlike 'Model I', this model assumes that person initial probability that person has classification (i,j), only depends of his classification at first time, that is \psi(i,j) = \psi(i).
Model III: Unlike 'Model I', this model assumes that transition probabilities between response and non response only depends of probability classification at first time, that is \rho_{MM}(i,j) = \rho_{MM}(i) and \rho_{RR}(i,j) = \rho_{RR}(i).
Model IV: Unlike 'Model I', this model assumes that transition probabilities between response and non response only depends of probability classification at second time, that is \rho_{MM}(i,j) = \rho_{MM}(j) and \rho_{RR}(i,j) = \rho_{RR}(j).

Value

estGF returns a list containing:

Est.CIV: a data.frame containing the gross flows estimation.
Params.Model: a list that contains the \hat{\eta}_{i}, \hat{p}_{ij}, \hat{\psi}(i,j), \hat{\rho}_{RR}(i,j), \hat{\rho}_{MM}(i,j) parameters for the estimated model.
Sam.Est: a list containing the sampling estimators \hat{N}_{ij}, \hat{R}_{i}, \hat{C}_{j}, \hat{M}, \hat{N}.

References

Stasny, E. (1987), ‘Some markov-chain models for nonresponse in estimating gross’, Journal of Oficial Statistics 3, pp. 359-373.
Sarndal, C.-E., Swensson, B. & Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlag, New York, USA.
Gutierrez, A., Trujillo, L. & Silva, N. (2014), ‘The estimation of gross ows in complex surveys with random nonresponse’, Survey Methodology 40(2), pp. 285-321.

Examples

library(TeachingSampling)
library(data.table)
# Colombia's electoral candidates in 2014
candidates_t0 <- c("Clara","Enrique","Santos","Martha","Zuluaga","WhiteVote", "NoVote")
candidates_t1 <- c("Santos","Zuluaga","WhiteVote", "NoVote")

N <- 100000
nCanT0 <- length(candidates_t0)
nCanT1 <- length(candidates_t1)
# Initial probabilities
eta <- matrix(c(0.10, 0.10, 0.20, 0.17, 0.28, 0.1, 0.05),
				byrow = TRUE, nrow = nCanT0)
# Transition probabilities
P <- matrix(c(0.10, 0.60, 0.15, 0.15,
				 0.30, 0.10, 0.25,0.35,
				 0.34, 0.25, 0.16, 0.25,
				 0.25,0.05, 0.35,0.35,
				 0.10, 0.25, 0.45,0.20,
				 0.12, 0.36, 0.22, 0.30,
				 0.10,0.15, 0.30,0.45),
		byrow = TRUE, nrow = nCanT0)
citaMod <- matrix(, ncol = nCanT1, nrow = nCanT0)
row.names(citaMod) <- candidates_t0
colnames(citaMod) <- candidates_t1

for(ii in 1:nCanT0){
		citaMod[ii,] <- c(rmultinom(1, size = N * eta[ii,], prob = P[ii,]))
}

# # Model I
psiI   <- 0.9
rhoRRI <- 0.9
rhoMMI <- 0.5

citaModI <- matrix(nrow = nCanT0 + 1, ncol = nCanT1 + 1)
rownames(citaModI) <- c(candidates_t0, "Non_Resp")
colnames(citaModI) <- c(candidates_t1, "Non_Resp")
citaModI[1:nCanT0, 1:nCanT1] <- P * c(eta) * rhoRRI * psiI
citaModI[(nCanT0 + 1), (nCanT1 + 1)] <- rhoMMI * (1-psiI)
citaModI[1:nCanT0, (nCanT1 + 1)] <- (1-rhoRRI) * psiI * rowSums(P * c(eta))
citaModI[(nCanT0 + 1), 1:nCanT1 ] <- (1-rhoMMI) * (1-psiI) * colSums(P * c(eta))
citaModI <- round_preserve_sum(citaModI * N)
DBcitaModI <- createBase(citaModI)

# Creating auxiliary information
DBcitaModI[,AuxVar := rnorm(nrow(DBcitaModI), mean = 45, sd = 10)]

# Selects a sample with unequal probabilities
res <- S.piPS(n = 3200, as.data.frame(DBcitaModI)[,"AuxVar"])
sam <- res[,1]
pik <- res[,2]
DBcitaModISam <- copy(DBcitaModI[sam,])
DBcitaModISam[,Pik := pik]

# Gross Flows estimation
estima <- estGF(sampleBase = DBcitaModISam, niter = 500, model = "I", colWeights = "Pik")
estima

[Package GFE version 0.1.1 Index]