R: Synthetic dataset due to Hankin

sweets {MM}

R Documentation

Synthetic dataset due to Hankin

Description

Four objects:

sweets is a \(2\times 3\times 21\) array
sweets_tally is a length 37 vector
sweets_array is a \(2\times 3\times 37\) vector
sweets_table is a \(37\times 6\) matrix

Usage

data(sweets)

Details

Object sweets is the raw dataset; objects sweets_table and sweets_tally are processed versions which are easier to analyze.

The father of a certain family brings home nine sweets of type mm and nine sweets of type jb each day for 21 days to his children, AMH, ZJH, and AGH.

The children share the sweets amongst themselves in such a way that each child receives exactly 6 sweets.

Array sweets has dimension c(2,3,21): 2 types of sweets, 3 children, and 21 days. Thus sweets[,,1] shows that on the first day, AMH chose 0 sweets of type mm and 6 sweets of type jb; child ZJH chose 3 of each, and child AGH chose 6 sweets of type mm and 0 sweets of type jb.

Observe the constant marginal totals: the kids have the same overall number of sweets each, and there are a fixed number of each kind of sweet.
Array sweets_array has dimension c(2,3,37): 2 sweets, 3 children, and 37 possible ways of arranging a matrix with the specified marginal totals. This can be produced by allboards() of the aylmer package.
sweets_table is a dataframe with six columns, one for each combination of child and sweet, and 37 rows, each row showing a permissible arrangement. All possibilities are present. The six entries of sweets[,,1] correspond to the six elements of sweets_table[1,]; the column names are mnemonics.
sweets_tally shows how often each of the arrangements in sweets_tally was observed (that is, it's a table of the 21 observations in sweets)

Source

The Hankin family

Examples

data(sweets)

# show correspondence between sweets_table and sweets_tally:
cbind(sweets_table, sweets_tally)

# Sum the data, by sweet and child and test:
fisher.test(apply(sweets,1:2,sum))
# Not significant!




# Now test for overdispersion.
# First set up the regressors:

jj1 <- apply(sweets_array,3,tcrossprod)
jj2 <- apply(sweets_array,3, crossprod)
dim(jj1) <- c(2,2,37)
dim(jj2) <- c(3,3,37)

theta_xy <- jj1[1,2,]
  phi_ab <- jj2[1,2,]
  phi_ac <- jj2[1,3,]
  phi_bc <- jj2[2,3,]

# Now the offset:
Off <- apply(sweets_array,3,function(x){-sum(lfactorial(x))})

# Now the formula:
f <- formula(sweets_tally~ -1 + theta_xy + phi_ab + phi_ac + phi_bc)

# Now the Lindsey Poisson device:
out <- glm(formula=f, offset=Off, family=poisson)

summary(out)
#  See how the residual deviance is comparable with the degrees of freedom

[Package MM version 1.6-8 Index]