fitSmoothHazard {casebase}  R Documentation 
Fit smoothintime parametric hazard functions.
Description
Miettinen and Hanley (2009) explained how casebase sampling can be used to estimate smoothintime parametric hazard functions. The idea is to sample personmoments, which may or may not correspond to an event, and then fit the hazard using logistic regression.
Usage
fitSmoothHazard(
formula,
data,
time,
family = c("glm", "gam", "glmnet"),
censored.indicator,
ratio = 100,
...
)
fitSmoothHazard.fit(
x,
y,
formula_time,
time,
event,
family = c("glm", "glmnet"),
censored.indicator,
ratio = 100,
...
)
prepareX(formula, data)
Arguments
formula 
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under Details. 
data 
a data frame, list or environment containing the variables in the
model. If not found in data, the variables are taken from

time 
a character string giving the name of the time variable. See Details. 
family 
a character string specifying the family of regression models used to fit the hazard. 
censored.indicator 
a character string of length 1 indicating which
value in 
ratio 
integer, giving the ratio of the size of the base series to that of the case series. Defaults to 100. 
... 
Additional parameters passed to fitting functions (e.g.

x 
Matrix containing covariates. 
y 
Matrix containing two columns: one corresponding to time, the other to the event type. 
formula_time 
A formula describing how the hazard depends on time. Defaults to linear. 
event 
a character string giving the name of the event variable. 
Details
The object data
should either be the output of the function
sampleCaseBase
or the source dataset on which casebase
sampling will be performed. In the latter case, it is assumed that
data
contains the two columns corresponding to the supplied time and
event variables. The variable time
is used for the sampling the base
series, and therefore it should represent the time variable on its original
(i.e. non transformed) scale. If time
is missing, the function looks
for a column named "time"
in the data. Note that the event variable is
inferred from formula
, since it is the left hand side.
For singleevent survival analysis, it is also possible to fit the hazard
function using glmnet
or gam
. The choice of fitting family is
controlled by the parameter family
. The default value is glm
,
which corresponds to logistic regression. For competing risk analysis, only
glm
and glmnet
are allowed.
We also provide a matrix interface through fitSmoothHazard.fit
, which
mimics glm.fit
. This is mostly convenient for family =
"glmnet"
, since a formula interface becomes quickly cumbersome as the number
of variables increases. In this setting, the matrix y
should have two
columns and contain the time and event variables (e.g. like the output of
survival::Surv
). We need this linear function of time in order to
perform casebase sampling. Therefore, nonlinear functions of time should be
specified as a onesided formula through the argument formula_time
(the lefthand side is always ignored).
prepareX
is a slightly modified version of the same function from
the glmnet
package. It can be used to convert a data.frame to a matrix
with categorical variables converted to dummy variables using onehot
encoding
Value
An object of glm
and lm
when there is only one event of
interest, or of class CompRisk
, which inherits from
vglm
, for a competing risk analysis. As such, functions like
summary
, deviance
and coefficients
give familiar
results.
Examples
# Simulate censored survival data for two outcome types from exponential
# distributions
library(data.table)
nobs < 500
tlim < 20
# simulation parameters
b1 < 200
b2 < 50
# event type 0censored, 1event of interest, 2competing event
# t observed time/endpoint
# z is a binary covariate
DT < data.table(z = rbinom(nobs, 1, 0.5))
DT[, `:=`(
"t_event" = rweibull(nobs, 1, b1),
"t_comp" = rweibull(nobs, 1, b2)
)]
DT[, `:=`(
"event" = 1 * (t_event < t_comp) + 2 * (t_event >= t_comp),
"time" = pmin(t_event, t_comp)
)]
DT[time >= tlim, `:=`("event" = 0, "time" = tlim)]
out_linear < fitSmoothHazard(event ~ time + z, DT, ratio = 10)
out_log < fitSmoothHazard(event ~ log(time) + z, DT, ratio = 10)
# Use GAMs
library(mgcv)
DT[event == 2, event := 1]
out_gam < fitSmoothHazard(event ~ s(time) + z, DT,
ratio = 10, family = "gam")