sampleCaseBase {casebase} | R Documentation |

## Create case-base dataset for use in fitting parametric hazard functions

### Description

This function implements the case-base sampling approach described in Hanley and Miettinen (2009). It can be used to fit smooth-in-time parametric functions easily via logistic regression.

### Usage

```
sampleCaseBase(
data,
time,
event,
ratio = 10,
comprisk = FALSE,
censored.indicator
)
```

### Arguments

`data` |
a data.frame or data.table containing the source dataset. |

`time` |
a character string giving the name of the time variable. See Details. |

`event` |
a character string giving the name of the event variable. See Details. |

`ratio` |
Integer, giving the ratio of the size of the base series to that of the case series. Defaults to 10. |

`comprisk` |
Logical. Indicates whether we have multiple event types and that we want to consider some of them as competing risks. |

`censored.indicator` |
a character string of length 1 indicating which
value in |

### Details

The base series is sampled using a multinomial scheme: individuals are sampled proportionally to their follow-up time.

It is assumed that `data`

contains the two columns corresponding to the
supplied time and event variables. If either the `time`

or `event`

argument is missing, the function looks for columns with appropriate-looking
names (see `checkArgsTimeEvent`

).

### Value

The function returns a dataset, with the same format as the source dataset, and where each row corresponds to a person-moment sampled from the case or the base series.

### Warning

The offset is calculated using the total follow-up time for
all individuals in the study. Therefore, we need `time`

to be on the
original scale, not a transformed scale (e.g. logarithmic). Otherwise, the
offset and the estimation will be wrong.

### Examples

```
# Simulate censored survival data for two outcome types from exponential
library(data.table)
set.seed(12345)
nobs <- 500
tlim <- 10
# simulation parameters
b1 <- 200
b2 <- 50
# event type 0-censored, 1-event of interest, 2-competing event
# t observed time/endpoint
# z is a binary covariate
DT <- data.table(z = rbinom(nobs, 1, 0.5))
DT[, `:=`(
"t_event" = rweibull(nobs, 1, b1),
"t_comp" = rweibull(nobs, 1, b2)
)]
DT[, `:=`(
"event" = 1 * (t_event < t_comp) + 2 * (t_event >= t_comp),
"time" = pmin(t_event, t_comp)
)]
DT[time >= tlim, `:=`("event" = 0, "time" = tlim)]
out <- sampleCaseBase(DT, time = "time", event = "event", comprisk = TRUE)
```

*casebase*version 0.10.5 Index]