apc.data.list {apc} | R Documentation |
Arrange data as an apc.data.list
Description
This is step 1 of the apc analysis.
The apc package is aimed at range of data types. This analysis and labelling of parameters depends on the choice data type. In order to keep track of this choice the data first has to be arranged as an apc.data.list. The function purpose of this function is to aid the user in constructing a list with the right information.
Age period cohort analysis is used in two situations. A dose-response situation, where both doses (exposure, risk set, cases) and responses (counts of deaths, outcomes) are available. And a response situation where only a response is available. If the aim is to directly model mortality ratios (counts of death divided by exposure) this will be thought of a response
The apc.data.list
gives sufficient information for the further analysis. It is sufficient to store this information.
It has 2 obligatory arguments, which are a response matrix and a character indicating the data format.
It also has some further optional arguments, which have certain default values.
Some times it may be convenient to add further arguments to the apc.data.list
. This will not affect the apc analysis.
apc.data.list
generates default row and column names for the response and dose matrices when these are not
provided by the user.
Usage
apc.data.list(response, data.format, dose=NULL,
age1=NULL, per1=NULL, coh1=NULL, unit=NULL,
per.zero=NULL, per.max=NULL,
time.adjust=NULL, label=NULL,
n.decimal=NULL)
Arguments
response |
matrix (or vector). Numbers of responses. It should have a format matching |
data.format |
character. The following options are implemented:
|
dose |
Optional. matrix or NULL. Numbers of doses. It should have same format as |
age1 |
Optional. Numeric or NULL. Time label for youngest age group. Used if |
per1 |
Optional. Numeric or NULL. Time label for oldest period group. Used if |
coh1 |
Optional. Numeric or NULL. Time label for youngest age group. Used if |
unit |
Optional. Numeric or NULL. Common time steps for age, period and cohort. For quarterly data use |
per.zero |
Optional. Numeric or NULL. Needed if data format is "trapezoid". |
per.max |
Optional. Numeric or NULL. Needed if data format is "trapezoid". |
time.adjust |
Optional. Numeric. Time labels are based on two of age1, per1 and coh1. The third time label is computed according to the formula age1+coh1=per1+time.adjust. Default is 0. If age1=coh=1 it is natural to choose time.adjust=1. |
label |
Optional. Character. Useful when working with multiple data sets. Some internal functions use the first three characters of the label for identification of the two datasets. |
n.decimal |
Optional. Numeric or NULL. The labels for parameters involves a date. This is found by converting a number into a character. If the value is set to |
Details
If the user does not set values for any of age1
, per1
, coh1
, unit
then the value is set to unit
.
The user can set values of age1
, per1
, coh1
that are incongruent. The functions only use two these that are relevant for the chosen
data.format
. Example: the data.format
may be "AC"
and the user sets
age1
, per1
, but age1
, coh1
are relevant for this data format.
The apc.data.list
then sets coh1=unit
, by default, while ignoring the value for per1
. Other commands such as
apc.data.list.subset
or apc.fit.table
,
will internally, as default option, call the function
apc.get.index
. That function will, in this example, set per1
according to the values of age1
and coh1
.
If the user does not set a value for time.adjust
this is set equal to unit
when the user does not specify at least two age1
, per1
, coh1
.
Otherwise it is set to 0.
The former choice matches the values in the theory papers, where indices count 1,2,... to follow standard notation for row/column indices for matrices, so that age+coh=per+unit.
The latter choice seeks to match a real time scale the user sets according to age+coh=per.
Value
response |
matrix (or vector). Numbers of responses. |
dose |
matrix (or NULL). Numbers of doses. |
data.format |
character. |
age1 |
Numeric. Default is NULL. |
per1 |
Numeric. Default is NULL. |
coh1 |
Numeric. Default is NULL. |
unit |
Numeric. Default is NULL. For monthly data one use |
per.zero |
Numeric. If data.format is not "trapezoid" the value is NULL. If data.format is "trapezoid" the coordinate system is in age-cohort format and this value counts how many periods are cut off. The default is |
per.max |
Numeric. If data.format is not "trapezoid" the value is NULL. If data.format is "trapezoid" the coordinate system is in age-cohort format and this value counts how many periods are included in the data array. The default is |
time.adjust |
Numeric. Default is NULL. |
label |
Character. Default of NULL. |
n.decimal |
Numeric or NULL. |
Author(s)
Bent Nielsen <bent.nielsen@nuffield.ox.ac.uk> 17 Nov 2016
References
Kuang, D., Nielsen, B. and Nielsen, J.P. (2008a) Identification of the age-period-cohort model and the extended chain ladder model. Biometrika 95, 979-986. Download: Article; Earlier version Nuffield DP.
Nielsen, B. (2014) Deviance analysis of age-period-cohort models. Download: Nuffield DP.
Nielsen, B. (2015) apc: An R package for age-period-cohort analysis. R Journal 7, 52-64. Download: Open access.
See Also
The below example shows how the data.Japanese.breast.cancer
data.list was generated.
Other provided data sets include
data.asbestos
data.Belgian.lung.cancer
data.Italian.bladder.cancer
.
A subset of the data can be selected using apc.data.list.subset
.
Examples
###############
# Artificial data
# (1) Generate a 5x7 matrix and make arbitrary decisions for rest
response <- matrix(data=seq(1:35),nrow=5,ncol=7)
data.list <- apc.data.list(response=response,data.format="AP",
age1=25,per1=1955,coh1=NULL,unit=5,
per.zero=NULL,per.max=NULL)
data.list
# (2) Chain Ladder data
k <- 5
v.response <- seq(1:(k*(k+1)/2))
data.list <- apc.data.list(response=vector.2.triangle(v.response,k),
data.format="CL.vector.by.row",age1=2001)
data.list
###############
# Japanese breast cancer
# This is the code used to generate the data.Japanese.breast.cancer
v.rates <- c( 0.44, 0.38, 0.46, 0.55, 0.68,
1.69, 1.69, 1.75, 2.31, 2.52,
4.01, 3.90, 4.11, 4.44, 4.80,
6.59, 6.57, 6.81, 7.79, 8.27,
8.51, 9.61, 9.96,11.68,12.51,
10.49,10.80,12.36,14.59,16.56,
11.36,11.51,12.98,14.97,17.79,
12.03,10.67,12.67,14.46,16.42,
12.55,12.03,12.10,13.81,16.46,
15.81,13.87,12.65,14.00,15.60,
17.97,15.62,15.83,15.71,16.52)
v.cases <- c( 88, 78, 101, 127, 179,
299, 330, 363, 509, 588,
596, 680, 798, 923, 1056,
874, 962, 1171, 1497, 1716,
1022, 1247, 1429, 1987, 2398,
1035, 1258, 1560, 2079, 2794,
970, 1087, 1446, 1828, 2465,
820, 861, 1126, 1549, 1962,
678, 738, 878, 1140, 1683,
640, 628, 656, 900, 1162,
497, 463, 536, 644, 865)
# see also example below for generating labels
rates <- matrix(data=v.rates,nrow=11, ncol=5,byrow=TRUE)
cases <- matrix(data=v.cases,nrow=11, ncol=5,byrow=TRUE)
# A data list is now constructed as follows
# note that list entry rates is redundant,
# but included since it represents original data
data.Japanese.breast.cancer <- apc.data.list(response=cases,
dose=cases/rates,data.format="AP",
age1=25,per1=1955,coh1=NULL,unit=5,
per.zero=NULL,per.max=NULL,time.adjust=0,
label="Japanese breast cancer")
# or when exploiting the default values
data.Japanese.breast.cancer <- apc.data.list(response=cases,
dose=cases/rates,data.format="AP",
age1=25,per1=1955,unit=5,
label="Japanese breast cancer")
###################################################
# Code for generating labels
row.names <- paste(as.character(seq(25,75,by=5)),"-",as.character(seq(29,79,by=5)),sep="")
col.names <- paste(as.character(seq(1955,1975,by=5)),"-",as.character(seq(1959,1979,by=5)),sep="")