sir_byfutime {msSPChelpR} | R Documentation |
Calculate standardized incidence ratios with custom grouping variables stratified by follow-up time
Description
Calculate standardized incidence ratios with custom grouping variables stratified by follow-up time
Usage
sir_byfutime(
df,
dattype = NULL,
ybreak_vars = "none",
xbreak_var = "none",
futime_breaks = c(0, 0.5, 1, 5, 10, Inf),
count_var,
refrates_df = rates,
calc_total_row = TRUE,
calc_total_fu = TRUE,
region_var = NULL,
age_var = NULL,
sex_var = NULL,
year_var = NULL,
race_var = NULL,
site_var = NULL,
futime_var = NULL,
expect_missing_refstrata_df = NULL,
alpha = 0.05,
quiet = FALSE
)
Arguments
df |
dataframe in wide format |
dattype |
can be "zfkd" or "seer" or NULL. Will set default variable names if dattype is "seer" or "zfkd". Default is NULL. |
ybreak_vars |
variables from df by which SIRs should be stratified in result df. Multiple variables will result in appended rows in result df. Careful: do not chose any variables that are dependent on occurrence of count_var (e.g. Histology of second cancer). If y_break_vars = "none", no stratification is performed. Default is "none". |
xbreak_var |
One variable from df by which SIRs should be stratified as a second dimension in result df. This variable will be added as a second stratification dimension to ybreak_vars and all variables will be calculated for subpopulations of x and y combinations. Careful: do not chose any variables that are dependent on occurrence of count_var (e.g. Year of second cancer). If y_break_vars = "none", no stratification is performed. Default is "none". |
futime_breaks |
vector that indicates split points for follow-up time groups (in years) that will be used as xbreak_var. Default is c(0, .5, 1, 5, 10, Inf) that will result in 5 groups (up to 6 months, 6-12 months, 1-5 years, 5-10 years, 10+ years). If you don't want to split by follow-up time, use futime_breaks = "none". |
count_var |
variable to be counted as observed case. Cases are usually the second cancers. Should be 1 for case to be counted. |
refrates_df |
df where reference rate from general population are defined. It is assumed that refrates_df has the columns "region" for region, "sex" for biological sex, "age" for age-groups (can be single ages or 5-year brackets), "year" for time period (can be single year or 5-year brackets), "incidence_crude_rate" for incidence rate in the respective age/sex/year cohort.The variable "race" is additionally required if the option "race_var" is used. refrates_df must use the same category coding of age, sex, region, year and t_site as age_var, sex_var, region_var, year_var and site_var. |
calc_total_row |
option to calculate a row of totals. Can be either FALSE for not adding such a row or TRUE for adding it at the first row. Default is TRUE. |
calc_total_fu |
option to calculate totals for follow-up time. Can be either FALSE for not adding such a column or TRUE for adding. Default is TRUE. |
region_var |
variable in df that contains information on region where case was incident. Default is set if dattype is given. |
age_var |
variable in df that contains information on age-group. Default is set if dattype is given. |
sex_var |
variable in df that contains information on sex. Default is set if dattype is given. |
year_var |
variable in df that contains information on year or year-period when case was incident. Default is set if dattype is given. |
race_var |
optional argument, if SIR should be calculated stratified by race. If you want to use this option, provide variable name of df that contains race information. If race_var is provided refrates_df needs to contain the variable "race". |
site_var |
variable in df that contains information on site or subsite (e.g. ICD code, SEER site code or others that matches t_site in refrates_df) of case diagnosis. Cases are usually the second cancers. Default is set if dattype is given. |
futime_var |
variable in df that contains follow-up time per person between date of first cancer and any of death, date of event (case), end of FU date (in years; whatever event comes first). Default is set if dattype is given. |
expect_missing_refstrata_df |
optional argument, if strata with missing refrates are expected, because incidence rates of value 0 are not explicit, but missing from refrates_df. It is assumed that expect_missing_refstrata_df is a data.frame has the columns "region" for region, "sex" for biological sex, "age" for age-groups (can be single ages or 5-year brackets), "year" for time period (can be single year or 5-year brackets), and "t_site" for The variable "race" is additionally required if the option "race_var" is used. refrates_df must use the same category coding of age, sex, region, year and t_site as age_var, sex_var, region_var, year_var and site_var. |
alpha |
significance level for confidence interval calculations. Default is alpha = 0.05 which will give 95 percent confidence intervals. |
quiet |
If TRUE, warnings and messages will be suppressed. Default is FALSE. |
Examples
#There are various preparation steps required, before you can run this function.
#Please refer to the Introduction vignette to see how to prepare your data
## Not run:
usdata_wide %>%
sir_byfutime(
dattype = "seer",
ybreak_vars = c("race.1", "t_dco.1"),
xbreak_var = "none",
futime_breaks = c(0, 1/12, 2/12, 1, 5, 10, Inf),
count_var = "count_spc",
refrates_df = us_refrates_icd2,
calc_total_row = TRUE,
calc_total_fu = TRUE,
region_var = "registry.1",
age_var = "fc_agegroup.1",
sex_var = "sex.1",
year_var = "t_yeardiag.1",
site_var = "t_site_icd.1", #using grouping by second cancer incidence
futime_var = "p_futimeyrs",
alpha = 0.05)
## End(Not run)