BPF {eiCircles} | R Documentation |
Ecological Inference of RxC Tables by Overdispersed-Multinomial Models
Description
Implements the model proposed in Forcina et al. (2012), as extension of Brown and Payne (1986), to estimate RxC vote transfer matrices (ecological contingency tables). Allows incorporation of covariates.
Usage
BPF(
X,
Y,
local = "lik",
covariates = NULL,
census.changes = "adjust1",
stable.units = TRUE,
stability.par = 0.12,
confidence = 0.95,
cs = 50,
null.cells = NULL,
row.cells.relationships = NULL,
row.cells.relationships.C = NULL,
pair.cells.relationships = NULL,
cells.fixed.logit = NULL,
dispersion.rows = data.frame(row1 = rep(1L, ncol(X) - 1L), row2 = 2:ncol(X)),
start.values = NULL,
seed = NULL,
max.iter = 100,
tol = 1e-04,
verbose = FALSE,
save.beta = FALSE,
...
)
Arguments
X |
matrix (or data.frame) of order KxR with either the electoral results recorded in election 1 or the sum across columns (the margins of row options) of the K ecological tables. |
Y |
matrix (or data.frame) of order KxC with either the electoral results recorded in election 2 or the sum across rows (the margins of column options) of the K ecological tables. |
local |
A character string indicating the algorithm to be used for adjusting the
estimates of the transition probabilities obtained for the whole area (electoral space)
with the actual observations available in each local unit. Only |
covariates |
A list with two components, |
census.changes |
A string character indicating how census changes between elections must be
handled. At the moment, it only admits two values |
stable.units |
A |
stability.par |
A non-negative number that controls the maximum proportion of relative change in the total census for a unit to be considered stable. Default, 0.12. The relative change is measured as the absolute value of the difference of the logarithms of the sizes (censuses) in the two elections. Measuring the relative change this way avoids dependence on which election is used as reference. |
confidence |
A number between 0 and 1 to be used as level of confidence for the confidence intervals of the transition
probabilities ( |
cs |
A positive number indicating the average number of cluster size. Default, 50. |
null.cells |
A matrix (or data.frame) with two columns (row, column) informing about the cells whose probabilities
should be constrained to be zero. Cells could be identified by position or names. For instance, (2, 3)
means that the probability corresponding to cell (2, 3) of the transfer matrix should be constrained to
be zero. Equally, (“party1”, “party2”) means that the transfer probability from “party1” (in |
row.cells.relationships |
A matrix (or data.frame) with four columns (row, column1, column2, constant) may be used to assign a
pre-specified value to the ratio between the transition probabilities of two cells
within the same row. Because the model takes the value in column2 as reference to define this constraint,
column1 and column2 must be different from the last column which has already been used to define the logits.
Rows and columns could be identified by position or names. For instance,
(2, 3, 5, 0.5) means that the probability corresponding to cell (2, 3) of the transfer
matrix is constrained to be equal to 0.5 times the probability corresponding to cell (2, 5)
of the transfer matrix. Because each cell defined by (row, column2) is used as reference relative to
the corresponding cell (row, column1), it is removed and thus that cell cannot be reference within two different constraints.
So, constraints involving the same cell should be defined with care.
To be specific, the cells defined by (row, columns2) should not appear in other constraints. For instance, if in the i-th row you want constrain
(cell 3) = (cell 1) x 0.6 and (cell 3) = (cell 2) x 0.3 you need to specify it as
(cell 3) = (cell 1) x 0.6 and as (cell 2) = (cell 1) x 2. See |
row.cells.relationships.C |
A matrix (or data.frame) with three columns (row, column, constant) informing about
the analog to the constraints described in |
pair.cells.relationships |
This is a kind of less stringent version of the argument |
cells.fixed.logit |
A matrix (or data.frame) with three columns (row, column, number) informing about the cells with
fixed values for the logit of the probability corresponding to the cell; this does not set the
actual transition but its ratio with respect to the reference category. For instance, (2, 3, -5) means
that the logit of the probability corresponding to cell (2, 3) of the transfer matrix is constrained to
be -5. See |
dispersion.rows |
A matrix (or data.frame) with two columns (row1, row2) indicating what pair of two rows should
have equal overdispersions. Default, over-dispersions are assumed to be the same in all rows:
|
start.values |
A vector of length |
seed |
A number indicating the random seed to be used. Default, |
max.iter |
Integer positive number. Maximum number of iterations to be performed for the Fisher scoring algorithm during the MLE estimation. Default, 100. |
tol |
Maximum value allowed for the numerical estimates of the partial derivatives of the likelihood in the point of convergence. Default, 0.0001. |
verbose |
A |
save.beta |
A |
... |
Other arguments to be passed to the function. Not currently used. |
Details
Description about how defining constraints in more detail.
To define constraints properly is a little tricky. Clearly, in the first place, it is the responsibility of the user to define constraints that are mutually compatible among themselves. The function does not check them to be jointly congruent. It is important to be aware that each linear constraint, when implemented, requires an element of the vector of internal parameters to be set to a known value and the corresponding element of the (underlying) design matrix to be removed. In addition, certain constraints are implemented by replacing one or more columns of the design matrix by suitable linear combinations of the columns that correspond to the cells involved in the constraint. A warning will be issued when two or more constraints require to remove the same column of the design matrix. To avoid conflicting constraints, a safe rule is that each constraint should be acting on disjoint sets of cells.
For each type of constraint, below we specify which column of the design matrix is removed and when a linear combination is needed how it is defined. Note that, in the unconstrained model, the design matrix has a column for each cell of the transition probabilities listed by row except for the last column which is used as reference:
-
null.cells
: The column of the design matrix corresponding to the cell defined by ’row’ and column’ declared when defining the constraint is removed. -
row.cells.relationships
: The column of the design matrix corresponding to the cell (row, column2) is removed while the one corresponding to the cell (row, column2) is adjusted. -
row.cells.relationships.C
: The column of the design matrix corresponding to the cell determined by each pair 'row', 'column' is removed. -
pair.cells.relationships
: This constraint is defined by 4 pairs of “row, column”; the column of the design matrix corresponding to the last pair (row2, column2.2) will be removed and the others adjusted.
Value
A list with the following components
TM |
The estimated RxC table (matrix) of transition probabilities/rates. This coincides with |
TM.votes |
The estimated RxC table (matrix) of votes corresponding to |
TP |
The estimated RxC table (matrix) of underlying transition probabilities obtained after applying the approach in Forcina et al. (2012) with the specified model. |
TR |
When |
TR.units |
When |
TR.votes.units |
When |
TP.lower |
A matrix of order RxC with the estimated lower limits of the confidence intervals, based on a normal approximation,
of the underlying transition probabilities ( |
TP.upper |
A matrix of order RxC with the estimated upper limits of the confidence intervals, based on a normal approximation,
of the underlying transition probabilities ( |
beta |
The estimated vector of internal parameters (logits) at convergence.
The first |
overdispersion |
The estimated vector at convergence of internal overdispersion parameters in the scale from 0 to 1. |
sd.TP |
Estimated standard deviations of the estimated transition probabilities. |
sd.beta |
The estimated standard errors of the elements of beta. |
cov.beta |
The estimated covariance matrix of beta. It may be used to compute approximate variances of transformations of the beta parameters, such as transition probabilities. |
madis |
A vector of length K with discrepancies of individual local units based on the Mahalanobis measure. It is essentially the quadratic discrepancy between observed and estimated votes weighted by the inverse of the estimated variance. |
lk |
The value of the log-likelihood at convergence. |
selected.units |
A vector with the indexes corresponding to the units finally selected to estimate the vote transition probability matrix. |
iter |
An integer number indicating the number of iterations performed before converging or when stopped. |
inputs |
A list containing all the objects with the values used as arguments by the function. |
Note
Constraints may be used to force estimates to take values different from those obtained by unconstrained estimation. As such, these tools should be used sparingly and, essentially, to assess whether estimates are substantially (significantly) different from what we would expect or unexpected estimates are only due to random variation. To first order approximation, twice the difference between the unconstrained and the constrained log-likelihood should be distributed as a chi-square with 1 degree of freedom. This allows to test which constraints are in substantial conflict with the data.
Author(s)
Antonio Forcina, forcinarosara@gmail.com
Jose M. Pavia, pavia@uv.es
References
Brown, P. and Payne, C. (1986). Aggregate data, ecological regression and voting transitions. Journal of the American Statistical Association, 81, 453–460. doi:10.1080/01621459.1986.10478290
Forcina, A., Gnaldi, M. and Bracalente, B. (2012). A revised Brown and Payne model of voting behaviour applied to the 2009 elections in Italy. Statistical Methods & Applications, 21, 109–119. doi:10.1007/s10260-011-0184-x
Examples
votes1 <- structure(list(P1 = c(16L, 4L, 13L, 6L, 1L, 16L, 6L, 17L, 48L, 14L),
P2 = c(8L, 3L, 0L, 5L, 1L, 4L, 7L, 6L, 28L, 8L),
P3 = c(38L, 11L, 11L, 3L, 13L, 39L, 14L, 34L, 280L, 84L),
P4 = c(66L, 5L, 18L, 39L, 30L, 57L, 35L, 65L, 180L, 78L),
P5 = c(14L, 0L, 5L, 2L, 4L, 21L, 6L, 11L, 54L, 9L),
P6 = c(8L, 2L, 5L, 3L, 0L, 7L, 7L, 11L, 45L, 17L),
P7 = c(7L, 3L, 5L, 2L, 3L, 17L, 7L, 13L, 40L, 8L)),
row.names = c(NA, 10L), class = "data.frame")
votes2 <- structure(list(C1 = c(2L, 1L, 2L, 2L, 0L, 4L, 0L, 4L, 19L, 14L),
C2 = c(7L, 3L, 1L, 7L, 2L, 5L, 3L, 10L, 21L, 6L),
C3 = c(78L, 7L, 28L, 42L, 28L, 84L, 49L, 85L, 260L, 100L),
C4 = c(56L, 14L, 20L, 7L, 19L, 54L, 22L, 50L, 330L, 91L),
C5 = c(14L, 3L, 6L, 2L, 3L, 14L, 8L, 8L, 45L, 7L)),
row.names = c(NA, 10L), class = "data.frame")
example <- BPF(votes1, votes2, local = "IPF")$TM