direct_standardization {popEpi} | R Documentation |
Direct Adjusting in popEpi Using Weights
Description
Several functions in popEpi have support for direct standardization of estimates. This document explains the usage of weighting with those functions.
Details
Direct standardization is performed by computing estimates of
E
by the set of adjusting variables A
, to which a set of weights
W
is applicable. The weighted average over A
is then the
direct-adjusted estimate of E
(E*
).
To enable both quick and easy as well as more rigorous usage of direct standardization with weights, the weights arguments in popEpi can be supplied in several ways. Ability to use the different ways depends on the number of adjusting variables.
The weights are always handled internally to sum to 1, so they do not need to be scaled in this manner when they are supplied. E.g. counts of subjects in strata may be passed.
Basic usage - one adjusting variable
In the simple case where we are adjusting by only one variable (e.g. by age group), one can simply supply a vector of weights:
FUN(weights = c(0.1, 0.25, 0.25, 0.2, 0.2))
which may be stored in advance:
w <- c(0.1, 0.25, 0.25, 0.2, 0.2)
FUN(weights = w)
The order of the weights matters. popEpi functions with direct
adjusting enabled match the supplied weights to the adjusting variables
as follows: If the adjusting variable is a factor
, the order
of the levels is used. Otherwise, the alphabetic order of the unique
values is used (try sort
to see how it works). For clarity
and certainty we recommend using factor
or numeric
variables
when possible. character
variables should be avoided: to see why,
try sort(15:9)
and sort(as.character(15:9))
.
It is also possible to supply a character
string corresponding
to one of the age group standardization schemes integrated into popEpi:
-
'europe_1976_18of5'
- European std. population (1976), 18 age groups -
'nordic_2000_18of5'
- Nordic std. population (2000), 18 age groups -
'world_1966_18of5'
- world standard (1966), 18 age groups -
'world_2000_18of5'
- world standard (2000), 18 age groups -
'world_2000_20of5'
- world standard (2000), 20 age groups -
'world_2000_101of1'
- world standard (2000), 101 age groups
Additionally, ICSS
contains international weights used in
cancer survival analysis, but they are not currently usable by passing
a string to weights
and must be supplied by hand.
You may also supply weights = "internal"
to use internally
computed weights, i.e. usually simply the counts of subjects / person-time
experienced in each stratum. E.g.
FUN(weights = "world_2000_18of5")
will use the world standard population from 2000 as
weights for 18 age groups, that your adjusting variable is
assumed to contain. The adjusting variable must be coded in this case as
a numeric variable containing 1:18
or as a factor
with
18 levels (coded from the youngest to the oldest age group).
More than one adjusting variable
In the case that you employ more than one adjusting variable, separate
weights should be passed to match to the levels of the different adjusting
variables. When supplied correctly, "grand" weights are formed based on
the variable-specific weights by multiplying over the variable-specific
weights (e.g. if men have w = 0.5
and the age group 0-4 has
w = 0.1
, the "grand" weight for men aged 0-4 is 0.5*0.1
).
The "grand" weights are then used for adjusting after ensuring they
sum to one.
When using multiple adjusting variables, you
are allowed to pass either a named list
of
weights or a data.frame
of weights. E.g.
WL <- list(agegroup = age_w, sex = sex_w)
FUN(weights = WL)
where age_w
and sex_w
are numeric vectors. Given the
conditions explained in the previous section are satisfied, you may also do
e.g.
WL <- list(agegroup = "world_2000_18of", sex = sex_w)
FUN(weights = WL)
and the world standard pop is used as weights for the age groups as outlined in the previous section.
Sometimes using a data.frame
can be clearer (and it is fool-proof
as well). To do this, form a data.frame
that repeats the levels
of your adjusting variables by each level of every other adjusting variable,
and assign the weights as a column named "weights"
. E.g.
wdf <- data.frame(sex = rep(0:1, each = 18), agegroup = rep(1:18, 2))
wdf$weights <- rbinom(36, size = 100, prob = 0.25)
FUN(weights = wdf)
If you want to use the counts of subjects in strata as the weights, one way to do this is by e.g.
wdf <- as.data.frame(x$V1, x$V2, x$V3)
names(wdf) <- c("V1", "V2", "V3", "weights")
Author(s)
Joonas Miettinen
References
Source of the Nordic standard population in 5-year age groups (also contains European & 1966 world standards): https://www-dep.iarc.fr/NORDCAN/english/glossary.htm
Source of the 1976 European standard population:
Waterhouse, J.,Muir, C.S.,Correa, P.,Powell, J., eds (1976). Cancer Incidence in Five Continents, Vol. III. IARC Scientific Publications, No. 15, Lyon, IARC. ISBN: 9789283211150
Source of 2000 world standard population in 1-year age groups: https://seer.cancer.gov/stdpopulations/stdpop.singleages.html
See Also
Other weights:
ICSS
,
stdpop101
,
stdpop18
Other popEpi argument evaluation docs:
flexible_argument