reweight.contingencytable {revengc} | R Documentation |
Reweighting a contingency table
Description
This function is used in the main function: rec.
Usage
reweight.contingencytable(observed.table, estimated.table)
Arguments
observed.table |
A censored contingency table. See Details section below formatting this data.frame. |
estimated.table |
A data.frame with uncensored row names and column names. |
Details
Format for observed.table:
The only symbols accepted for censored data are listed below. Note, less than or equal to (<= and LE) is not equivalent to less than (< and L) and greater than or equal to (>=, +, and GE) is not equivalent to greater than (> and G). Also, calculations use closed intervals.
left censoring: <, L, <=, LE
interval censoring: - or I (symbol has to be placed in the middle of the two category values)
right censoring: >, >=, +, G, GE
uncensored: no symbol (only provide category value)
The column names should be the Y category values. The first column should be the X category values and the row names can be arbitrary. The inside of the table are X * Y cross tabulation, which are either positive frequency values or probabilities. The row and column marginal totals corresponding to their X and Y category values need to be placed in this table. The top left, top right, and bottom left corners of the table should be NA or blank. The bottom right corner can be a total cross tabulation sum value, NA, or blank. The table below is a formatted example.
NA | <20 | 20-30 | >30 | NA |
<5 | 18 | 19 | 8 | 45 |
5-9 | 13 | 8 | 12 | 33 |
>=10 | 7 | 5 | 10 | 21 |
NA | 38 | 32 | 31 | NA |
Value
Interior probability cells of a censored contingency table (observed.table) is reweighted to match interior probability cells of an uncensored contingency table (estimated.table). If observed.table consist of frequencies, the reweight.contingencytable() function changes frequencies to probabilities.
Examples
## going through the coding step of rec ##
# first create contingency table
contingencytable<-matrix(1:9,
nrow = 3, ncol = 3)
rowmarginal<-apply(contingencytable,1,sum)
contingencytable<-cbind(contingencytable, rowmarginal)
colmarginal<-apply(contingencytable,2,sum)
contingencytable<-rbind(contingencytable, colmarginal)
row.names(contingencytable)[row.names(contingencytable)=="colmarginal"]<-""
contingencytable<-data.frame(c("<5", "5I9", ">9", NA), contingencytable)
colnames(contingencytable)<-c(NA,"<=19","20-30",">=31", NA)
# provided upper and lower bound values for table
# X=row and Y=column
Xlowerbound=1
Xupperbound=15
Ylowerbound=15
Yupperbound=35
# table of row marginals provides average x and phi x
row.marginal.table<-row.marginal(contingencytable)
x<-cnbinom.pars(row.marginal.table)
# table of column marginals provides average y and phi y
column.marginal.table<-column.marginal(contingencytable)
y<-cnbinom.pars(column.marginal.table)
# create row and column ranges
rowrange<-Xlowerbound:Xupperbound
colrange<-Ylowerbound:Yupperbound
library(truncdist)
# new uncensored row marginal table = truncated negative binomial distribution
# rowrange = X is distributed given a < X <= b
uncensored.row.margin<-dtrunc(rowrange, mu=x$Average, size = x$Dispersion,
a = Xlowerbound-1, b = Xupperbound, spec = "nbinom")
# new uncensored column margin table = = truncated negative binomial distribution
# colrange = Y is distributed given a < Y <= b
uncensored.column.margin<-dtrunc(colrange, mu=y$Average, size = y$Dispersion,
a = Ylowerbound-1, b = Yupperbound, spec = "nbinom")
# sum of truncated distributions equal 1
sum(uncensored.row.margin)
sum(uncensored.column.margin)
# look at the seed for this example (probabilities)
seed.output<-seedmatrix(contingencytable, Xlowerbound,
Xupperbound, Ylowerbound, Yupperbound)$Probabilities
# run mipfp
# store the new margins in a list
tgt.data<-list(uncensored.row.margin, uncensored.column.margin)
# list of dimensions of each marginal constrain
tgt.list<-list(1,2)
# calling the estimated function
## seed has to be in array format for mipfp package
## ipfp is the selected seed.estimation.method
## $p.hat gives probabilies = x.hat/sum(x.hat)
library(mipfp)
final1<-Estimate(array(seed.output,dim=c(length(Xlowerbound:Xupperbound),
length(Ylowerbound:Yupperbound))), tgt.list, tgt.data, method="ipfp")$x.hat
# filling in names of updated seed
final1<-data.frame(final1)
row.names(final1)<-Xlowerbound:Xupperbound
names(final1)<-Ylowerbound:Yupperbound
# reweight estimates to known censored interior cell probabilities
final1<-reweight.contingencytable(observed.table = contingencytable, estimated.table = final1)
# see that they sum to one
sum(final1)
# rec function outputs the same table
# default of rec seed.estimation.method is ipfp
# default of rec seed.matrix is the output of the seedmatrix() function
final2<-rec(X= contingencytable,
Xlowerbound = 1,
Xupperbound = 15,
Ylowerbound = 15,
Yupperbound = 35)
# check that both data.frame results have same values
all(final1 == final2$Probability.Estimates)