pclm {ungroup}R Documentation

Univariate Penalized Composite Link Model (PCLM)

Description

Fit univariate penalized composite link model (PCLM) to ungroup binned count data, e.g. age-at-death distributions grouped in age classes.

Usage

pclm(
  x,
  y,
  nlast,
  offset = NULL,
  out.step = 1,
  ci.level = 95,
  verbose = FALSE,
  control = list()
)

Arguments

x

Vector containing the starting values of the input intervals/bins. For example: if we have 3 bins [0,5), [5,10) and [10, 15), x will be defined by the vector: c(0, 5, 10).

y

Vector with counts to be ungrouped. It must have the same dimension as x.

nlast

Length of the last interval. In the example above nlast would be 5.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

ci.level

Level of significance for computing confidence intervals. Default: 95.

verbose

Logical value. Indicates whether a progress bar should be shown or not. Default: FALSE.

control

List with additional parameters:

  • lambda – Smoothing parameter to be used in pclm estimation. If lambda = NA an algorithm will find the optimal values.

  • kr – Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See MortSmooth_bbase.

  • deg – Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

  • int.lambda – If lambda is optimized an interval to be searched needs to be specified. Format: vector containing the end-points.

  • diff – An integer indicating the order of differences of the components of PCLM coefficients.

  • opt.method – Selection criterion of the model. Possible values are "AIC" and "BIC".

  • max.iter – Maximal number of iterations used in fitting procedure.

  • tol – Relative tolerance in PCLM fitting procedure.

Details

The PCLM method is based on the composite link model, which extends standard generalized linear models. It implements the idea that the observed counts, interpreted as realizations from Poisson distributions, are indirect observations of a finer (ungrouped) but latent sequence. This latent sequence represents the distribution of expected means on a fine resolution and has to be estimated from the aggregated data. Estimates are obtained by maximizing a penalized likelihood. This maximization is performed efficiently by a version of the iteratively reweighted least-squares algorithm. Optimal values of the smoothing parameter are chosen by minimizing Bayesian or Akaike's Information Criterion.

Value

The output is a list with the following components:

input

A list with arguments provided in input. Saved for convenience.

fitted

The fitted values of the PCLM model.

ci

Confidence intervals around fitted values.

goodness.of.fit

A list containing goodness of fit measures: standard errors, AIC and BIC.

smoothPar

Estimated smoothing parameters: lambda, kr and deg.

bins.definition

Additional values to identify the bins limits and location in input and output objects.

deep

A list of objects created in the fitting process. Useful in diagnosis of possible issues.

call

An unevaluated function call, that is, an unevaluated expression which consists of the named function applied to the given arguments.

References

Rizzi S, Gampe J, Eilers PHC (2015). “Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.” American Journal of Epidemiology, 182(2), 138-147. doi:10.1093/aje/kwv020.

See Also

control.pclm plot.pclm

Examples

# Data  
x <- c(0, 1, seq(5, 85, by = 5))
y <- c(294, 66, 32, 44, 170, 284, 287, 293, 361, 600, 998, 
       1572, 2529, 4637, 6161, 7369, 10481, 15293, 39016)
offset <- c(114, 440, 509, 492, 628, 618, 576, 580, 634, 657, 
            631, 584, 573, 619, 530, 384, 303, 245, 249) * 1000
nlast <- 26 # the size of the last interval

# Example 1 ----------------------
M1 <- pclm(x, y, nlast)
ls(M1)
summary(M1)
fitted(M1)
plot(M1)

# Example 2 ----------------------
# ungroup even in smaller intervals
M2 <- pclm(x, y, nlast, out.step = 0.5)
head(fitted(M1))
plot(M1, type = "s")
# Note, in example 1 we are estimating intervals of length 1. In example 2 
# we are estimating intervals of length 0.5 using the same aggregate data.

# Example 3 ----------------------
# Do not optimise smoothing parameters; choose your own. Faster.
M3 <- pclm(x, y, nlast, out.step = 0.5, 
           control = list(lambda = 100, kr = 10, deg = 10))
plot(M3)

summary(M2)
summary(M3) # not the smallest BIC here, but sometimes is not important.

# Example 4 -----------------------
# Grouped x & grouped offset (estimate death rates)
M4 <- pclm(x, y, nlast, offset)
plot(M4, type = "s")

# Example 5 -----------------------
# Grouped x & ungrouped offset (estimate death rates)

ungroupped_Ex <- pclm(x, y = offset, nlast, offset = NULL)$fitted # ungroupped offset data

M5 <- pclm(x, y, nlast, offset = ungroupped_Ex)

[Package ungroup version 1.4.4 Index]