Frechet.bounds.cat {StatMatch} | R Documentation |
Frechet bounds of cells in a contingency table
Description
This function permits to derive the bounds for cell probabilities of the table Y vs. Z starting from the marginal tables (X vs. Y), (X vs. Z) and the joint distribution of the X variables.
Usage
Frechet.bounds.cat(tab.x, tab.xy, tab.xz, print.f="tables", align.margins = FALSE,
tol= 0.001, warn = TRUE)
Arguments
tab.x |
A R table crossing the X variables. This table must be obtained by using the function |
tab.xy |
A R table of X vs. Y variable. This table must be obtained by using the function A single categorical Y variable is allowed. One or more categorical variables can be considered as X variables (common variables). Obviously, the same X variables in When |
tab.xz |
A R table of X vs. Z variable. This table must be obtained by using the function A single categorical Z variable is allowed. One or more categorical variables can be considered as X variables (common variables). The same X variables in When |
print.f |
A string: when |
align.margins |
Logical (default |
tol |
Tolerance used in comparing joint distributions as far as X variables are considered (default |
warn |
Logical, when |
Details
This function permits to compute the expected conditional Frechet bounds for the relative frequencies in the contingency table of Y vs. Z, starting from the distributions P(Y|X), P(Z|X) and P(X). The expected conditional bounds for the relative frequencies p_{j,k}
in the table Y vs. Z are:
p^{(low)}_{Y=j,Z=k} = \sum_{i} p_{X=i}\max (0; p_{Y=j|X=i} + p_{Z=k|X=i} - 1 )
p^{(up)}_{Y=j,Z=k} = \sum_{i} p_{X=i} \min ( p_{Y=j|X=i}; p_{Z=k|X=i})
The relative frequencies p_{X=i}=n_i/n
are computed from the frequencies in tab.x
;
the relative frequencies p_{Y=j|X=i}=n_{ij}/n_{i+}
are derived from tab.xy
,
finally, p_{Z=k|X=i}=n_{ik}/n_{i+}
are derived from tab.xz
.
Estimation requires that all the starting tables share the same marginal distribution of the X variables.
This function returns also the unconditional bounds for the relative frequencies in the contingency table of Y vs. Z, i.e. computed also without considering the X variables:
\max\{0; p_{Y=j} + p_{Z=k} - 1\} \leq p_{Y=j,Z=k} \leq \min \{ p_{Y=j}; p_{Z=k}\}
These bounds represent the unique output when tab.x = NULL
.
Finally, the contingency table of Y vs. Z estimated under the Conditional Independence Assumption (CIA) is obtained by considering:
p_{Y=j,Z=k} = p_{Y=j|X=i} \times p_{Z=k|X=i} \times p_{X=i}.
When tab.x = NULL
then it is also provided the expected table under the assumption of independence between Y and Z:
p_{Y=j,Z=k} = p_{Y=j} \times p_{Z=k}.
The presence of too many cells with 0s in the input contingency tables is an indication of sparseness; this is an unappealing situation when estimating the cells' relative frequencies needed to derive the bounds; in such cases the corresponding results may be unreliable. A possible alternative way of working consists in estimating the required parameters by considering a pseudo-Bayes estimator (see pBayes
); in practice the input tab.x
, tab.xy
and tab.xz
should be the ones provided by the pBayes
function.
Value
When print.f="tables"
(default) a list with the following components:
low.u |
The estimated lower bounds for the relative frequencies in the table Y vs. Z without conditioning on the X variables. |
up.u |
The estimated upper bounds for the relative frequencies in the table Y vs. Z without conditioning on the X variables. |
CIA |
The estimated relative frequencies in the table Y vs. Z under the Conditional Independence Assumption (CIA). |
low.cx |
The estimated lower bounds for the relative frequencies in the table Y vs. Z when conditioning on the X variables. |
up.cx |
The estimated upper bounds for the relative frequencies in the table Y vs. Z when conditioning on the X variables. |
uncertainty |
The uncertainty associated to input data, measured in terms of average width of uncertainty bounds with and without conditioning on the X variables. |
When print.f="data.frame"
the output list contains just two components:
bounds |
A data.frame whose columns reports the estimated uncertainty bounds. |
uncertainty |
The uncertainty associated to input data, measured in terms of average width of uncertainty bounds with and without conditioning on the X variables. |
Author(s)
Marcello D'Orazio mdo.statmatch@gmail.com
References
D'Orazio, M., Di Zio, M. and Scanu, M. (2006) “Statistical Matching for Categorical Data: Displaying Uncertainty and Using Logical Constraints”, Journal of Official Statistics, 22, pp. 137–157.
D'Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.
See Also
Examples
data(quine, package="MASS") #loads quine from MASS
str(quine)
# split quine in two subsets
suppressWarnings(RNGversion("3.5.0"))
set.seed(7654)
lab.A <- sample(nrow(quine), 70, replace=TRUE)
quine.A <- quine[lab.A, 1:3]
quine.B <- quine[-lab.A, 2:4]
# compute the tables required by Frechet.bounds.cat()
freq.xA <- xtabs(~Sex+Age, data=quine.A)
freq.xB <- xtabs(~Sex+Age, data=quine.B)
freq.xy <- xtabs(~Sex+Age+Eth, data=quine.A)
freq.xz <- xtabs(~Sex+Age+Lrn, data=quine.B)
# apply Frechet.bounds.cat()
bounds.yz <- Frechet.bounds.cat(tab.x=freq.xA+freq.xB, tab.xy=freq.xy,
tab.xz=freq.xz, print.f="data.frame")
bounds.yz
# harmonize distr. of Sex vs. Age during computations
# in Frechet.bounds.cat()
#compare marg. distribution of Xs in A and B vs. pooled estimate
comp.prop(p1=margin.table(freq.xy,c(1,2)), p2=freq.xA+freq.xB,
n1=nrow(quine.A), n2=nrow(quine.A)+nrow(quine.B), ref=TRUE)
comp.prop(p1=margin.table(freq.xz,c(1,2)), p2=freq.xA+freq.xB,
n1=nrow(quine.A), n2=nrow(quine.A)+nrow(quine.B), ref=TRUE)
bounds.yz <- Frechet.bounds.cat(tab.x=freq.xA+freq.xB, tab.xy=freq.xy,
tab.xz=freq.xz, print.f="data.frame", align.margins=TRUE)
bounds.yz