apc.fit.model {apc} | R Documentation |
Fits an age period cohort model
Description
apc.fit.model
fits the age period cohort model as a Generalized Linear Model using glm.fit
.
The model is parametrised in terms of the canonical parameter introduced by Kuang, Nielsen and Nielsen (2008),
see also the implementation in Martinez Miranda, Nielsen and Nielsen (2015).
This parametrisation has a number of advantages: it is freely varying, it is the canonical parameter of a
regular exponential family, and it is invariant to extentions of the data matrix.
Other parametrizations can be computed using apc.identify
.
apc.fit.model
can be be used for all three age period cohort factors, or for submodels with fewer of these factors.
apc.fit.model
can be used either for mortality rates through a dose-response model or for mortality counts through a pure response model without doses/exposures.
The GLM families include Poisson regressions (with log link) and Normal/Gaussian least squares regressions.
apc.fit.table produces a deviance table for 15 combinations of the three factors and linear trends: "APC", "AP", "AC", "PC", "Ad", "Pd", "Cd", "A", "P", "C", "t", "tA", "tP", "tC", "1".
Usage
apc.fit.model(apc.data.list,model.family,model.design,apc.index=NULL,
replicate.version.1.3.1=FALSE)
apc.fit.table(apc.data.list,model.family,model.design.reference="APC",
apc.index=NULL)
Arguments
apc.data.list |
List. See |
model.family |
Character. The following options are implemented. These are used internally when
calling
|
model.design |
Character. This indicates the design choice. The following options are possible.
|
model.design.reference |
Character. This indicates the reference design choice for the deviance table. Choices are "APC","AP","AC","PC","Ad","Pd","Cd","A","P","C","t". Default is "APC". |
apc.index |
Optional. List. See |
replicate.version.1.3.1 |
Optional. Logical. Replicate error in covariance calculation for "poisson.response","od.poisson.response" in versions 1.2.3-1.3.1. Default=FALSE |
Value
apc.fit.table produces a deviance table. There are 15 rows corresponding to all possible design choices. The columns are as follows.
"-2logL" |
-2 log Likelihood up to some constant.
If the model family is Poisson or binomial (logistic)
this is the same as the |
"df.residual" |
Degrees of freedom of residual: nrow x ncol - dim(parameter). If the model.family="poisson.response" the degrees of freedom is one lower. |
"prob(>chi_sq)" |
p-value of the deviance, -2logL. Left out in Gaussian case which has no saturated model |
"LR vs APC" |
the likelihood ratio statistic against the "APC" model. |
"df" |
Degrees of freedom against the "APC" model. |
"prob(>chi_sq)" |
p-value of log likelihood ratio statistic. |
"aic" |
Akaike's "An Information Criterion", minus twice the maximized log-likelihood plus twice the
number of parameters upto a constant. It is take directly from the
|
"F" |
Only for "od.poisson.response". F statistic: Ratio of deviance for submodel divided by degrees of freedom to deviance of apc model divided by degrees of freedom. |
"prof(>F)" |
Only for "od.poisson.response". F statistic: with degrees of freedom given by differences between sub-model and apc model and between apc model and saturated model. |
apc.fit.model returns a list. The entries are as follows.
fit |
List. Values from |
apc.index |
List. Values from |
coefficients.canonical |
Matrix. For each coordinate of the canonical parameters is reported coefficient, standard deviation, z-value, which is the ratio of those, and asymptotically normal p-values.
Note, for "od.poisson.response" the reported standard errors corrected by the deviance and p-values are asymptotically t distributed, see Harnau and Nielsen (2016).
Other parametrizations can be computed using |
covariance.canonical |
Matrix. Estimated covariance matrix for canonical parameters. |
slopes |
Vector. Length three. The design matrix found by |
difdif |
Vector. Length three. The design matrix found by |
index.age |
Vector. Indices for age double difference parameters within |
index.per |
Vector. Indices for period double difference parameters within |
index.coh |
Vector. Indices for cohort double difference parameters within |
dates |
Vector. Indicates the dates for the double difference parameters within |
model.family |
Character. Argument. |
model.design |
Character. Argument. |
RSS |
Numeric. Residual sum of squares. NULL for non-gaussian families. |
sigma2 |
Numeric. Maximum likelihood estimator for variance: RSS/n. NULL for non-gaussian families. |
s2 |
Numeric. Least squares estimator for variance: RSS/df. NULL for non-gaussian families. |
n.decimal |
Numeric. From |
predictors |
Vector. Design*Estimates.
Same as the |
Note
For gaussian families deviance is defined differently in apc
and glm
.
Here it is -2 log likelihood. In glm
it is RSS.
The values for apc.fit.model
include the apc.data.list
and the apc.index
returned by
apc.get.index
.
For the poisson.response
the inference is conditional on the level, see Martinez Miranda, Nielsen and Nielsen (2015).
The coefficients.canonical
computed by apc
are therefore different from the default coefficients
computed by glm
.
For the od.poisson.response
an asymptotic theory is used that mimics the conditioning for poisson.response
.
The asymptotic distribution are, however, asymptotically t or F distributed, see Harnau and Nielsen (2017).
For the log.normal.response
standard normal theory applies for quantities on the log scale including estimators.
An asymptotic theory for quantities on the original scale is provided in Kuang and Nielsen (2018).
For coefficients
the 3rd and 4th columns have headings t value
and Pr(>|t|)
for od.poisson.response
to indicate an asymptotic t theory
and otherwise
z value
and Pr(>|z|)
to indicate an asymptotic normal theory. The labels are inherited from glm.fit
.
Author(s)
Bent Nielsen <bent.nielsen@nuffield.ox.ac.uk> 15 Aug 2018 (27 Aug 2014)
References
Harnau, J. and Nielsen (2016) Over-dispersed age-period-cohort models. To appear in Journal of the American Statistical Association. Download: Nuffield DP
Kuang, D, Nielsen B (2018) Generalized log-normal chain-ladder. mimeo Nuffield Collge.
Kuang, D., Nielsen, B. and Nielsen, J.P. (2008a) Identification of the age-period-cohort model and the extended chain ladder model. Biometrika 95, 979-986. Download: Article; Earlier version Nuffield DP.
Martinez Miranda, M.D., Nielsen, B. and Nielsen, J.P. (2015) Inference and forecasting in the age-period-cohort model with unknown exposure with an application to mesothelioma mortality. Journal of the Royal Statistical Society A 178, 29-55. Download: Article, Nuffield DP.
See Also
The fit is done using glm.fit
.
The examples below use Italian bladder cancer data, see data.Italian.bladder.cancer
and
Belgian lung cancer data, see data.Belgian.lung.cancer
.
In example 3 the design matrix is called is called using apc.get.design
.
Examples
#####################
# EXAMPLE 1 with Italian bladder cancer data
data.list <- data.Italian.bladder.cancer() # function gives data list
apc.fit.table(data.list,"poisson.dose.response")
# -2logL df.residual prob(>chi_sq) LR.vs.APC df.vs.APC prob(>chi_sq) aic
# APC 33.179 27 0.191 NA NA NA 487.624
# AP 512.514 40 0.000 479.335 13 0.000 940.958
# AC 39.390 30 0.117 6.211 3 0.102 487.835
# PC 1146.649 36 0.000 1113.470 9 0.000 1583.094
# Ad 518.543 43 0.000 485.364 16 0.000 940.988
# Pd 4041.373 49 0.000 4008.194 22 0.000 4451.818
# Cd 1155.629 39 0.000 1122.450 12 0.000 1586.074
# A 2223.800 44 0.000 2190.621 17 0.000 2644.245
# P 84323.944 50 0.000 84290.765 23 0.000 84732.389
# C 23794.205 40 0.000 23761.026 13 0.000 24222.650
# t 4052.906 52 0.000 4019.727 25 0.000 4457.351
# tA 5825.158 53 0.000 5791.979 26 0.000 6227.602
# tP 84325.758 53 0.000 84292.579 26 0.000 84728.203
# tC 33446.796 53 0.000 33413.617 26 0.000 33849.241
# 1 87313.678 54 0.000 87280.499 27 0.000 87714.123
#
# Table suggests that "APC" and "AC" fit equally well. Try both
fit.apc <- apc.fit.model(data.list,"poisson.dose.response","APC")
fit.ac <- apc.fit.model(data.list,"poisson.dose.response","AC")
# Compare the estimates: They are very similar
fit.apc$coefficients.canonical
fit.ac$coefficients.canonical
#####################
# EXAMPLE 2 with Belgian lung cancer data
# This example illustrates how to find the linear predictors
data.list <- data.Belgian.lung.cancer()
# Get an APC fit
fit.apc <- apc.fit.model(data.list,"poisson.dose.response","APC")
# The linear predictor of the fit is a vector.
# But, we would like it in the same format as the data.
# Thus create matrix of same dimension as response data
# This can be done in two ways
m.lp <- data.list$response # using original information
m.lp <- fit.apc$response # using information copied when fitting
# the fit object index.data is used to fill linear predictor in
# vector format into matrix format
m.lp[fit.apc$index.data] <-fit.apc$linear.predictors
exp(m.lp)
#####################
# EXAMPLE 3 with Belgian lung cancer data
# This example illustrates how apc.fit.model works.
data.list <- data.Belgian.lung.cancer()
# Vectorise data
index <- apc.get.index(data.list)
v.response <- data.list$response[index$index.data]
v.dose <- data.list$dose[index$index.data]
# Get design
m.design <- apc.get.design(index,"APC")$design
# Fit using glm.fit from stats package
fit.apc.glm <- glm.fit(m.design,v.response,family=poisson(link="log"),offset=log(v.dose))
# Get canonical coefficients
v.cc <- fit.apc.glm$coefficients
# Find linear predictors and express in matrix form
m.fit <- data.list$response # create matrix
m.fit[index$index.data] <- m.design
m.fit.offset <- m.fit + log(data.list$dose) # add offset
exp(m.fit.offset)
# Compare with linear.predictors from glm.fit
# difference should be zero
sum(abs(m.fit.offset[index$index.data]-fit.apc.glm$linear.predictors))
#####################
# EXAMPLE 4 with Taylor-Ashe loss data
# This example illustrates the over-dispersed poisson response model.
data <- data.loss.TA()
fit.apc.od <- apc.fit.model(data,"od.poisson.response","APC")
fit.apc.od$coefficients.canonical[1:5,]
fit.apc.no.od <- apc.fit.model(data,"poisson.response","APC")
fit.apc.no.od$coefficients.canonical[1:5,]